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ISOLATION AND IDENTIFICATION OF MOUSE AND HUiVIAN 
TRANSCRIPTION CONTROL ELEMENTS ASSOCLATED WITH 
CYTOCHROME EXPRESSION 

5 Field of tiul Lvn'ention 

The present invention relates generally to the field of molecular biology and 
medicine. In particular, the invention relates to transcription control elements derived 
from genomic locii of the murine CypSAll gene and the human CYP3A4 gene, as 
well as methods of using the same. The invention further relates to isolated 

10 polynucleotides derived from regulatory regions of the murine CypSAll gene and the 
human CYP3A4 gene, reporter constructs comprising those isolated polynucleotides, 
cells trar^^-^rmed with those reporter constructs, ti'ansgenic animals comprising those 
reporter constructs, and methods of use of such cells and transgenic animals for 
identifying compounds that modulate expression mediated by the murine Cyp3 Al 1 

15 gene and the human CYP3A4 gene derived transcription control elements. The 

invention further relates to in vivo assay methods which employ animals transfected 
with such reporter constructs. 

B.-VCKGROUND OF THE LNATENTION 

20 Toxicology studies of substances have traditionally relied on unicellular 

organisms (for example, the Ames test or the yeast carcinogenic assay described in 
U.S. Patent No. 4,997,757) or in vitro systems for toxicity testing and the prediction 
of human risk. However, there are many factors that make it difficuk to extrapolate 
from such data to human risk including cellular affinity of the substance, uptake and 

25 distribution differences between single cells and whole animals, metabolism of the 

substance, and cascade effects where the effect of the substance is raediated through a 
cellular process. These same factors can affect tlie progress of pharmaceutical 
research and development as well when attempting to determining and/or predicting 
the effects of an analyte in an animal system. 

30 Further, the end-point of traditional animal based toxicology studies is *' 

typically determination of an LD50 (the dose at which 50% of the test animals die). 
Dead animals may be subjected to fiu'ther analysis, for example, histopathology, but 

1 
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sucli analysis is generally labor intensive and relatively insensitive. MacGregor, et al 
{Fundamental and Applied Toxicology, 26:156-173, 1995) have reviewed nnolecular 
end-points and niethods of routine toxicity testing including the following: damage- 
indiicible genes in individual cells; bacterial models of toxicity; screening of stress- 

5 gene expression using hybridization or polymerase chain reaction; hybridization 
probes for detection of chi'omosomal aberrations; single cell electrophoresis assays; 
and in vivo animal studies involving animal sacrifice and subsequent analysis of 
tissue/cellular damage. 

P450 enzymes have been shown to be involved in the biosynthesis of steroids 

10 and cholesterols and in metabolizing drugs or xenobiotics. P450 enzyme induction is 
a result of fluctuations in levels of steroids and cholesterols, or of repeated exposure 
to drugs or xenobiotics. Changes in P450 enzyme levels result in changes in plasma 
and/or tissue levels of the drugs they metabolize, which in turn affects the stability, 
efficacy and toxicity of those drugs. Among P450 superfamilies; the C\p3 A family 

15 typically accounts for 14-31% of total P450 present in human liver microsomes and 
for 50-60% of the drug metabolic activity. (Toide et al. (1997) Arch. Biochem. arid 
Biophysics 338:43-49). Clones encoding distinct Cyp3A forms have been isolated 
from human, rat, guinea pig and mice, including Cyp3Al 1 in mice and CYP3 A4 in 
humans. Therefore, P450 enzyme expression, particularly the Cyp3A family of 

20 genes, is a vital pTiarraacological parameter of bioavailability of pharmaceutical 
agents, as well as of diiig-to-drug interactions. 

Currently, conventional assays for P450 gene regulation are laborious and 
time-consuming, for example Northern blots, Western blots, RT-PCR or reporter 
assays ex vivo. In addition, expression of P450 genes in cell line has proven difficult. 

25 Thus, there remains a need to directly monitor P450 gene regulation in real-time in 
live aiiinials. 



Brief Descrii>tion or tpie Drawings 

Figure lA (SEQ ID NO: 12) comprises the nucleotide sequence of a 

30 transcripiional control element from the mouse Cyp3Al 1 gene locus. In the figure, 

ihc iequeace represents 12,275 nucleoLides in total, the uanslational siavc codon 

(ATG) is located at positions 11,003-1 1,005, a TATA box is located at positions 

2 
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10,884 to 10,887, a major transcription start site begins with the C at position 10,914. 
An approxiinately.9.3 kb region of the Cyp3Al 1 gene is from nucleotide position 1 to 
9,330 of Figure lA and the approximately 9.3 kb sequence is presented alone in 
Figure IB (SEQ K) NO: 13). 
5 Figure 2 is a schematic of the pGL3-I-3Al IS vector construct containing the 

1 .6 kb Cyp3 Al 1 promoter sequence. 

Figure 3 is a schematic of the pGL3-I-3Al IM vector construct containing the 
6.0 kb Cyp3 Al 1 promoter sequence. 

Figure 4 is a schematic of the pGL3-IoAl IL vector construct containing the 
10 9.0 kb Cyp3 Al 1 promoter sequence. 

Figure 5 is a schematic of the pBSSK-3 Al IS vector construct containing tlie 
1.6 kb Cyp3All promoter sequence. 

Figure 6 is a schematic of the pBSSK-3 Al IM vector constnict containing the 
6 kb Cyp3Al 1 promoter sequence. 
15 Figure 7 is a schematic of the pGL3-I-Basic vector construct. 

Figure 8 is a schematic of the pGL3-I-3A4M vector construct containing the 
10 kb CYP3A4 promoter sequence. 

Figure 9 is a schematic of the pGL3-I-3A4L vector construct containing the 13 
kb CYP3 A4 promoter sequence. 
20 Figure 10 depicts the results of liver push experiments wherein FVB mice 

were liver-pushed with 5 ug of the pGL3-I-3AllM constaict and 0 ug of liPXR 
plasmid. 

Figure II, panels A-D, depict the results of hPXR titration experiments 
performed in order to optimize the amount of IiPXR plasmid co-administered with 5 
25 ugofpGL3-I-3AllM. Panel A, 0 pg hPXR + 5 |.tg 3A11M-Iuc. Panel B, 1 pg hPXR 
+ 5 ug 3A1 IM-kic. Panel C, 2 ^ig liPXR + 5 ^ig 3A1 IM-luc. Panel D, 5 ug hPXR 5 
pg3AllM-]uc, 

Figure 12 depicts the results of liver push experiments wherein F\^B mice 
v/ere liver-pushed with 5 ug of the pGL3-I-3Al IL construct and 1 ug of liPXR 
30 plasmid. 

Figure 13, pmieh A-D, depict the results of hPXR titration experiments 
pej-formed in order to optimize the amount of hPXR plasmid co-administered v/ith 5 
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of pGL3-I-3Al IL. Panel A, 0 hPXR + 5 \ig 3A1 IL-luc. Panel B, 1 |ag hPXR + 
5 pg 3A1 IL-luc. Panel C, 2 pg hPXR -f 5 |ig 3A1 IL-luc. Panel D, 5 pg hPXR + 5 pg 
3A11L-1UC. 

Figure 14 depicts the results of liver push experiments wherein FVB mice 
5 were liver-pushed with 5 pg of the pGL3-I-3A4L construct and 0 pg of hPXR 
plasniid. 

Figure 15 depicts the results of liver push experiments wherein FVB mice 
were liver-pushed with 5 pg of the pGL3-I-3A4L construct and 1 ug of liPXR 
plasmid. 

10 Figui-e 16, panels A-D, depict the results of hPXR titration experiments 

performed in order to optimize the amount of hPXR plasmid co-administered with 5 
pg of pGL3-I-3A4L. Panel A, 0 pg hPXR + 5 pg 3A4L-Iuc. Panel B, 1 |.ig hPXR + 5 
|ig 3A4L-1UC. Panel C, 2 jig hPXR + 5 pg 3A4L-luc. Panel D, 5 pg hPXR + 5 pg 
3A4L-IUC. 

15 Figure 17A (SEQ ID NO: 14) comprises the nucleotide sequence of a 

transcriptional control element from the human CYP3A4 gene locus. In the figure, the 
sequence represents 13,035 nucleotides in total, the translational start codon (ATG) is 
located at positions 13,033 to 13,035, a TATA box is located at positions 12,901 to 
12,904, a major transcription start site begins with the A at position 12,930. An 

20 approximately 2.5 kb region of the CYP3A4 gene, useful to facilitate expression as 
described herein, is from nucleotide position 1 to 2,461 of Figure 17A and the 
approximately 2.5 kb sequence is presented alone in Figure 17B (SEQ ID NO: 15). 
Figure 17C (SEQ ID NO: 17) presents the entire sequence of CYP3A4-]uc transgene 
used to generate FVB/N-TgN(CYP3A4-/tfc) mice. 

25 Figure 18 presents a schematic diagram of an approximately 9.3 kb promoter 

region sequence, located 5^ to the Cyp3All coding sequences in the mouse genome, 
where the diagram shows the approximate locations of repeat elements from two 
known families. 

Figure 19 presents a schematic diagram of an approximately 13 kb promoter 
30 region sequence, located 5^ to the CYP3A4 coding sequences m the human genomb, 
where the diagram shows the approximate locations of repeat elements from two 
known families. 
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Figure 20 presents exemplary results of PGR screening CYP3 A4-/wc Tg niice. 
Figure 21 presents exemplary Southern hybridization analysis data for 
FVB/N-TgN(CYP3A4-/t^c). 

Figure 22, panel A, B, C, and D. present exemplary results of the effects of 
5 xenobiotics on expression of CYP3A4-luc in the #82 line of Tg FvB mice. 

Summary of the Invention 

The present invention relates to transcription control elements derived from 
mouse and human genes associated with cytoclTrome expression, e.g., CypSAl 1 and 
CYP3A4, respectively. The present invention comprises isolated pol}TTUcleotides, 
expression cassettes, vectors, recombinant cells, liver-push non-human animals and 
transgenic, non-human animals that comprise the transcription control elements 
described herein. 

In one aspect, the present invention relates to transcription control elements 
5 derived from cytochrome P450 genes (e.g.,Cyp3Al 1 and CYP3 A4), expression 

cassettes which include those control elements, vector constructs, cells and transgenic 
animals containing the expression cassettes, and methods of using the cells and 
transgenic animals containing the expression cassettes, for example, as modeling, 
screening and/or test systems. Methods of using the control elements, expression 
0 cassettes, cells, arid transgenic animals of the present invention include, but are not 
limited to, studies involving toxicity and drug metabolism, and methods for screening 
drug metabolism, safety and/or possible toxicity. Exemplary transcription control 
elements useful in the practice of the present invention include those derived from 
mouse C}'p3Al 1 locus and those derived from the human CYP3A4 locus. 
5 In particular, the invention relates to transcription control elements derived 

from genomic locii of the murine Cyp3 Al 1 gene and the human CYP3 A4 gene, 
^.'herein these transcription control elements are associated with a reponer sequence. 
In particular, rcconibinant nucleic acid molecules comprising SEQ ID NO: 12, SEQ 
ID NO: 13, SEQ ID NO: 14, and SEQ ID NO: 15, as well as fragments thereof, are 
0 described. The invention further relates to bi vivo assay methods which employ 
animals transfected with such reporter constructs. 

In one aspect, the present invention comprises a polynucleotide, or fragnients 
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thereof typically greater than 100 contiguous nucleotides, derived from the mouse 
Cyp3Al 1 gene, the polynucleotide (or fragments thereof) having at least 95% identity 
to nucleotides 1-1 1,002 of SEQ ID N0:12 (or corresponding fragments thereof). The 
polynucleotide (or framents thereof) may be operably linked to a coding sequence of 

5 interest. The polynucleotide (or fragments thereof) typically comprises at least one 
transcriptional control element. An expression cassette may comprise the 
polynucleotide and coding sequence of interest. 

In another aspect, the present invention comprises a polynucleotide, or 
fragments thereof typically greater than 100 contiguous nucleotides, derived from the 

10 mouse CypJAl 1 gene, the polynucleotide (or fragments thereof) having at least 95% 
identity to the sequence of SEQ ID NO: 13 (or fragments thereof at least about 100 
contiguous nucleotides of SEQ ID NO: 13). The polynucleotide (or framents thereof) 
may be operably linked to a coding sequence of interest. The polynucleotide (or 
fragments thereof) typically comprises at least one transcriptional control element. 

15 An expression cassette may comprise the polynucleotide and coding sequence of 
interest. In one embodiment, the polynucleotide comprises a first polynucleotide 
having 95% identity or greater to nucleotides 5104-6218 of SEQ ID NO; 13 and a 
second polynucleotide having 95% identity or greater to nucleotides 6792-9330 of 
SEQrDN0:13. 

20 In one aspect, the present invention includes isolated polynucleotides and/or 

expression cassettes comprising a polynucleotide having at least about 95% identity to 
the sequence of SEQ ID N0:13, or fragments thereof, operably linked to a coding 
sequence of interest, wherein the polynucleotide or fragments thereof comprise at 
least one transcriptional control element. 

25 In another aspect, the present invention includes isolated polynucleotides 

and/or expression cassettes comprising a polynucleotide having at least about 95% 
identity to the sequence of SEQ ID NO: 15, or fragments thereof, operably linked to a 
coding sequence of interest, wherein the polynucleotide or fragmenis thereof 
comprise at least one transcriptional control element. 

30 In some embodiments the coding sequence of interest is a reporter sequence, 

for example, a light-generating protein. Such light-generating proteins comprise 

bioluminescent proteins (including but not liirdted to, procaryotic or eucaryotic 

6 
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luciferase) and fluorescent proteins (including but not limited to, blue fluorescent 
protein, cyan fluo;rescent protein, green fluorescent protein, yellow fluorescent 
protein, and red fluorescent protein, as well as, enhanced and/or destabilized variants 
thereot). 

5 The present invention also includes vectors comprising the isolated 

polynucleotides and/or expression cassettes of the present invention. Such vectors 
typically include a vector backbone, and may be linear or circular, comprise one or 
more origins of replication (e.g., a shuttle vector), be site-specifically or randomly 
integrating, and comprise one or more selectable or screenable markers. 

10 In one embodiment the present invention includes cells comprising the 

expression cassettes and/or vectors of the present invention. In another embodiment, 
transgenic non-human, animals (e.g., rodents, including, but not limited to, mice, rats, 
hamsters, gerbils, and guinea pigs) may comprise the expression cassettes and/or 
vectors or the present invention. In a further embodiment, the present invention 

15 includes non-human animals that comprise a subset of cells comprising the expression 
cassettes and/or vectors of the present invention, for example, non-human animals 
whose livers comprise cells transfected with the constructs of the present invention. 
Such non-human animals may be generated, for example, by administration of the 
expression cassettes and/or vectors of the present invention via intravenous injection. 

20 In yet another aspect, the present invention includes methods of using the 

expression cassettes, vectors, cells, and non-human animals of the present invention. 
In one embodiment, the invention includes a method for identifying an analyte that 
modulates expression (for example, of a reporter sequence) mediated by mouse 
Cyp3Al 1 gene-derived transcription control elements and/or a human CYP3A4 gene- 

25 derived transcription control elements in a transgenic, living, non-human animal. 
Such a method typically comprises administering to the animal an analyte (e.g., a 
drug). Tlie animal comprises one or more of the expression cassettes or vectors of the 
present invention typically including a reporter sequence. Expression of the reporter 
sequence is monitored. An effect on the level of expression of the reporter sequence 

30 indicates that the analyte affects expression of the gene corresponding to the 

transcriptional control elements which comprise the expression cassettes and/or 
vectors employed in the method. 
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Another method comprises identifying an analyte that modulates expression 
(for example, of a reporter sequence) mediated by mouse Cyp3 Al 1 gene-derived 
transcription control elements and/or a human CYP3A4 gene-dedved transcription 
control elements in a transgenic, living, non-human animal. In this method a vector 
5 mixture, comprising an expression cassette of the present invention, is administered to 
the animal concomitant with, before, or after administration of an analyte. The vector 
mixture comprises one or more of the expression cassettes of the present invention 
typically including a reporter sequence. Expression of the reporter sequence is 
monitored. An effect on the expression of the reporter sequence indicates that the 

10 analyte affects expression mediated by the transcriptional control elements that 
comprise the expression cassettes and/or vectors employed in the method. In one 
embodiment the vector mixture is administered by intravenous injection. 

In a further embodiment of the present invention, the expression cassettes 
comprising the transcription control elements of the present invention and a reporter, 

15 are used to monitor the expression of the mouse CypSAl 1 gene or the human 
CYP3 A4 gene in a cell. In this embodiment expression of a reporter sequence is 
monitored in the cell and expression of the reporter sequence corresponds to 
expression of gene corresponding to the transcriptional control elements which 
comprise the expression cassettes and/or vectors employed in the method. Further, 

20 analytes may be screened such cells wherein an effect on the expression of the 
reporter sequence indicates that the analyte affects expression mediated by the 
transcriptional control elements that comprise the expression cassettes and/or vectors 
employed in the method. 

In another aspect, the present invention comprises, a transgenic, non-human 

25 animal, e.g., rodent. The transgenic, non-human animal typically comprises, an 
expression cassette comprising a polynucleotide derived from the hmmn CYP3 A4 
gene, the polynucleotide having at least 95% or greater identity to nucleotides 1- 
13,032 of SEQ IDN0:14 (or fragments thereof), wherein (i) the polynucleotide (or 
iiagments thereof) is operably linked to a coding sequence of interest, (ii) the 

30 polynucleotide (or fragments thereof) comprises at least one transcriptional control 
element, and (iii) expression of the coding sequence of interest is induced in the liver 
of the living, transgenic, non-human animal by dexamethasone or rifampicin. 
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In one embodiment, expression of the coding sequence of interest is induced 
in the living, transgenic animal by dexaraethasone administered at 50 mg/kg body 
weight, and/or expression of the coding sequence of interest is induced in the living, 
transgenic animal by rifampicin administered at 50 mg/kg body weight. Further, 
5 induction of expression of the coding sequence of interest may be greater than or 
equal to 10-fold induction by dexamethasone over basal levels, and/or, induction of 
expression of the coding sequence of interest is greater than or equal to two-fold 
induction by rifampicin over basal levels. 

In one embodiment, basal expression of the coding sequence in the liver 

10 region of the living, transgenic, non- human animal is greater than or equal to that in 
other regions of the body of the living, transgenic non-human animal. 

In another embodiment, the transgenic, non-human animal does not have 
sequences encoding a functional hPXR (a human rifampicin co-receptor). That is, the 
animal does not express a function human PXR gene product. 

15 In a further embodiment, expression of the coding sequence of interest is 

induced in the living, transgenic, non-human animal by at least one compound 
selected from the group consisting of phenobarbitol, nifedipine, 5-pregnene-3b-OL- 
20-ONE-16a-Carbonitrile and clotrimazole, wherein induction of expression is seen in 
the liver region of the living, transgenic animal. 

20 In the transgenic, non-human animal the coding sequence of interest may, for 

example, be a reporter sequence. Such a reporter sequence may, for example, encode 
a Jight-generating protein (e.g., a bio luminescent protein or a fluorescent protein). 
One exemplary bioluminescent protein is luciferase. In one embodiment of the 
invention, the transgenic, non-human animal may include an expression cassette 

25 comprising SEQ ID NO: 17 (an exemplary CYP3A4/luc transgene). Exemplary 
fluorescent proteins include, but are not limited to, blue fluorescent proiein,, cyan 
fluorescent protein, green fluorescent protein, yellow fluorescent protein, and red 
fluorescent protein. 

The transgenic, non-human animal may be a rodent, includiiig, but not limited 

30 to, mouse, rat, hamster, gerbil, or guinea pig. 

Tlie present invention also includes a method for identifying an analyte that 

modulates expression of a reporter sequence, wherein expression of - he reporter 

9 
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sequence is mediated by transcription control elements derived from, for example, a 
human CYP3 A4 gene, in a transgenic, living rodent. In the method the analyte is 
administered to the transgenic, living, non-human transgenic animal described above. 
Expression of the reporter sequence is monitored. An effect on the level of 
_ 5 expression of the reporter sequence indicates that the analyte affects mediated by 
transcription control elements, e.g., derived from the human CY?3A4 gene. 

These and other embodiments of the present invention will readily occur to 
those of ordinary skill in the art in view of the disclosure herein. 

1 0 Detailed Description of tiie Invention 

The practice of the present invention will employ, unless otherwise indicated, 
conventional techniques of molecular biology, microbiology, cell biology, transgenic 
animal manipulation, and recombinant DNA, which are within the skill of the art. 
See, e.g., Sambrook, Fritsch, and Maniatis, MOLECULAR CLONING: A 

15 LABORATORY MANUAL, 2nd edition .(1989); CURRENT PROTOCOLS IN 
MOLECULAR BIOLOGY, (RM. Ausubel et al. eds., 1987); the series METHODS 
IN ENZYMOLOGY (Academic Press, Inc.); PGR 2: A PRACTICAL APPROACH 
(MJ. McPherson, B.D. Hames and G.R. Taylor eds., 1995); ANIMAL CELL 
CULTURE (R.I. Freshney. Ed., 1987); "Transgenic Animal Technology: A 

20 Laboratory Handbook," by Carl A. Pinkert, (Editor) Fii'st Edition, Academic Press; 
ISBN: 0125571658; and "Manipulating the Mouse Embryo : A Laboratory Manual," 
Brigid Hogan, et al, ISBN: 0879693843, Publisher: Cold Spring Harbor Laboratory 
Press, Pub. Date: September 1999, Second Edition. 

25 1. DEFINITrONS 

In describing the present invention, the following terms wjlJ b& employed, and 
are intended to be defined as indicated below. Unless otherwise indicated, all terms 
used herein have the same meaning as they would to one skilled in the art of the 
present invention. 

30 ■ The terms "nucleic acid molecule" and "polynucleotide" are used 

interchangeably to and refer to a polymeric form of nucleotides of any length, cither 

deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may 

10 
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have any three-dimensional structure, and may perform any function, known or 
unknown. Non-liniiting examples of polynucleotides include a gene, a gene fragment, 
exons, introns, messenger RNA (niRNA), transfer RNA, ribosomal RNA, ribozymes. 
cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, 
5 isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, 
and primers. 

A polynucleotide is typically composed of a specific sequence of four 
nucleotide bases: adenine (A); cytosine (C); guanine (G); and thymine (T) (uracil (U) 
for thymine (T) when the polynucleotide is RNA). Thus, the term polynucleotide 

10 sequence is the alphabetical representation of a polynucleotide molecule. This 

alphabetical representation can be input into databases in a computer having a central 
processing unit and used for bioinformatics applications such as fimctional genomics 
and homology searching. 

A "coding sequence" or a sequence which "encodes" a selected polypeptide, 'is 

15 a nucleic acid molecule which is transcribed (in the case of DNA) and translated (in 
the case of mRN A) into a polypeptide, for example, in vivo when placed under the 
control of appropriate regulatory sequences. (or "control elements"). The boundaries 
of the coding sequence are typically determined by a start codon at the 5' (amino) 
terminus and a translation stop codon at the 3' (carboxy) terminus. A coding sequence 

20 can include, but is not limited to, cDNA from viral, procaryotic or eucaryotic mRNA, 
genomic DNA sequences from viral or procaryotic DNA, and even synthetic DNA 
sequences. A transcription termination sequence may be located 3' to the coding 
sequence. Other "'control elements" may also be associated with a coding sequence. 
A DNA sequence encoding a polypeptide can be optimized for expression in a 

25 selected cell by using the codons preferred by the selected cell to represent the DNA 
copy of the desired polypeptide coding sequence. "Bncoded by" refers lo a nucleic 
acid sequence which codes for a polypeptide sequence, wherein the polypeptide 
sequence or a portion thereof contains an amino acid sequence of at lease 3 to 5 amino 
acids, more prefei-ably at least S to 10 amino acids, and even more preferably at least 

30 15 to 20 anuno acids from a polypeptide encoded by the nucleic acid sequence. Also 
encompassed are polypepiide sequences, v/hich are immunologically identifiable with 
a poiypepride encoded by the sequence. 
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A "transcription factor'^ typically refers to a protein (or polypeptide) which 
affects the transcription, and accordingly the expression, of a specified gene. A 
transcription factor may refer to a single polypeptide transcription factor, one or more 
polypeptides acting sequentially or in concert, or a complex of polypeptides. 
5 Typical "control elements" include, but are not limited to, transcription 

promoters, transcription enhancer elements, cis-acting transcription regulating 
elements (transcription regulators, e.g., a cis-acting element that affects the 
transcription of a gene, for example, a region of a promoter with which a transcription 
factor interacts to induce or repress expression of a gene), transcription initiation 

10 signals (e.g.,^ATA box), basal promoters, transcription terniination signals, as weU 
as polyadenylation sequences (located 3' to the translation stop codon), sequences for 
optimization of initiation of translation (located 5* to the coding sequence), translation 
enhancing sequences, and translation termination sequences. Transcription promoters 
can include, for example, inducible promoters (where expression of a polynucleotide 

15 sequence operably linked to the promoter is induced by an analyte, cofactor, 

regulatory protein, etc.), repressible promoters (where expression of a polynucleotide 
sequence operably linked to the promoter is induced by an analyte, cofactor, 
regulatory protein, etc.), and constitutive promoters. 

''Expression enhancing sequences," also referred to"as "enhancer sequences" 

20 or "enhancers," typically refer to control elements that improve transcription or 

translation of a polynucleotide relative to the expression level in the absence of such 
control elements (for example, promoters, promoter enhancers, enhancer elements, 
and translational enhancers (e.g., Shine and Delagarno sequences)). 

The term "modulation" refers to both inhibition, including partial inhibition, as 

25 well as stimulation. Thus, for example, a compound that modulates expression of a 
reporter sequence may either inhibit that expression, either partially or completel}', or 
stimulate expression of the sequence. 

"Purified polynucleotide" refers to a polynucleotide of interest or fragment 
thereof which is essentially free, e.g., contains less than about 50%, preferably less 

30 than about 70%, and more preferably less than about 90%, of the protein v/ith whic^i 
the polynucleotide is naturally associated. Techniques for purifying polynucleotides 
of interest are well known in the art and include, for example, disruption of the cell 
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containing the polynucleotide with a chaotropic agent and separation of the 
polynucleotide(s) and proteins by ion-exchange chromatography, affinity 
chromatography and sedimentation according to density. 

A "heterologous sequence" typically refers to either (i) a nucleic acid 
5 sequence that is not normally found in the cell or organism of interest, or (ii) a nucleic 
acid sequence introduced at a genomic site wherein the nucleic acid sequence does 
not normally occur in nature at that site. For example, a DNA sequence encoding a 
polypeptide can be obtained from yeast and introduced into a bacterial cell. In this 
case the yeast DNA sequence is "heterologous" to the native DNA of the bacterial 

10 cell. Alternatively, a promoter sequence from a Tie2 gene can be introduced into the 
genomic location of ^ifosB gene. In this case the Tie2 promoter sequence is 
''heterologous" to the native /o^S genomic sequence. 

A "polypeptide" is used in it broadest sense to refer to a compound of two or 
more subunit amino acids, amino acid analogs, or other peptidomimetics. The 

15 subunits may be linked by peptide bonds or by other bonds, for example ester, ether, 
etc. The term "amino acid" typically refers to either natural and/or unnatural or 
synthetic amino acids, including glycine and both the D or L optical isomers, and 
aiTiino acid analogs and peptidomimetics. A peptide of three or more amino acids is 
commonly called an oligopeptide if the peptide chain is short. If the peptide chain is 

20 long, the peptide is typically called a polypeptide or a protein. 

"Operably linked" refers to an arrangement of elements wherein the 
components so described ai-e configured so as to perform theii' usual function. Thus, a 
given promoter that is operably linked to a coding sequence (e.g., a reporter 
expression cassette) is capable of effecting the expression of the coding sequence 

25 v/hen the proper enzymes are present. The promoter or other control elements need 
not be contiguous with the coding sequence, so long as they fiuiction to dii-ect the 
expression thereof For example, intervening untranslated yet transcribed sequences 
can be present between the promoter sequence and the coding sequence and the 
promoter sequence can still be considered "operably linked" to the coding sequence. 

30 "J^ecornbinant" describes a nucleic acid molecule means a polynucleoticle of 

genomic, cDNA, semisynthetic, or synthetic origin which, by virtue of its origin or 
manipulation: (1) is not associated with all or a portion of the polynucleotide with 
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which it is associated in nature; and/or (2) is linked to a polynucleotide other than that 
to which it is linked in nature. The term "recombhiant" as used with respect to a 
protein or polypeptide means a polypeptide produced by expression of a recombinant 
polynucleotide. "Recombinant host cells," "host cells," "cells," "cell lines," "cell 
5 cultures," and other such terms denoting, e.g., procaryotic microorganisms or 
eucaryotic cell lines cultured as unicellular entities, are used interchangeably, and 
refer to cells which can be, or have been, used as recipients for recombinant vectors or 
other transfer DN A, and include the progeny of the original cell which has been 
transfected. It is understood that the progeny of a single parental cell may not 

10 necessarily be completely identical in morphology or in genomic or total DNA 

complement to the original parent, due to accidental or deliberate mutation. Progeny 
of the parental cell which are sufficiently similar to the parent to be characterized by 
the relevant property, such as the presence of a nucleotide sequence encoding a 
desired peptide, are included in the progeny intended by this definition, and are 

15 covered by the above terms. 

An "isolated polynucleotide" molecule is a nucleic acid molecule separate and 
discrete from the whole organism with which the molecule is found in nature; or a 
nucleic acid molecule devoid, in whole or part, of sequences normally associated with 
it in nature; or a sequence, as it exists in nature, but having heterologous sequences • 

20 (as defined below) in association therewith. 

Techniques for determining nucleic acid and amino acid ''sequence identity" 
also are known in the art. Typically, such techniques include determining the 
nucleotide sequence of the mRNA for a gene and/or determining the amino acid 
sequence encoded thereby, and comparing these sequences to a second nucleotide or 

25 amino acid sequence. In general, ^'identity" refers to an exact nucleotide-to- 
nucleotide or amino acid-to-amino acid correspondence of two polynucleotides or 
polypeptide sequences, respectively. Two or more sequences (polynucleotide or 
amino acid) can be compared by deterirdning their "percent identity." The percent 
identity of two sequences, whether nucleic acid or amino acid sequences, is the 

30 number of exact marches between two aligned sequences divided by the length of the 

shorter sequences and multiphed by 100. An approximate alignment for nucleic acid 

sequences is provided by the local homology algorithm of Smith and Waterman, 

14 
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Advances in Applied Mathematics 2:482-489 (1981). Tiiis algorithm can be applied 
to amino acid sequences by using the scoring matrix developed by Dayhoff, Atlas of 
Protein Sequences and Structure . M.O. Dayhoff ed., 5 suppl. 3:353-358, National 
Biomedical Research Foundation, Washington, D.C., USA, and normalized by 
5 Gribskov, Nucl. Acids Res. I4(6):6745-6763 (1986). An exemplary implementation 
of this algorithm to determine percent identity of a sequence is provided by the 
Genetics Computer Group (Madison, WI) in the "BestFit" utility application. The 
default parameters for this method are described in the Wisconsin Sequence Analysis 
Package Program Manual, Version 8 (1995) (available from Genetics Computer 

10 Group, Madison, WI). A preferred method of establishing percent identity in the 
context of the present invention is to use the MPSRCH package of programs 
copyrighted by the University of Edinburgh, developed by John F. Collins and Shane 
S. Sturrok, and distributed by IntelliGenetics, Inc. (Mountain View, CA). From this 
suite of packages the Smith- Waterman algorithm can be employed where default 

15 parameters are used for the scoring table (for example, gap open penalty of 12, gap 
extension penalty of one, and a gap of six). From the data generated the "Match" 
value reflects "sequence identity." Other suitable programs for calculating the percent 
identity or similarity between sequences are generally known in the art, for example, 
another alignment program is BLAST, used with default parameters. For example, 

20 BLASTN and BLASTP can be used using the following default parameters: genetic 
code = standai-d; filter = none; strand = both; cutoff = 60; expect = 10; Matrix = 
BLOSUM62; Descriptions = 50 sequences; sort by = HIGH SCOl^; Databases = 
non-redundant, GenBank + EMBL + DDBJ + PDB -t- GenBank CDS translations + 
Swiss protein 4* Spupdate FIR. Details of these programs can be found at the 

25 following internet address: http://www.ncbi. nlm,gov/cgi-binABLAST . 

One of skill in the art can readily determine the proper search parameters to 
use for a given sequence in the above programs. For example, tlic search parameters 
may vary based on the size of the sequence in question. Thus, for example, a 
representative embodiment of the present invention wouki include an isolated 

30 polynucleotide comprising X contiguous nucleotides, wherein (i) the X contiguous *' 
nucleotides have at least about 50% identity to Y contiguous nucleotides derived from 
any of the sequences described herein, (ii) X equals Y, and (iii) X is equal to from 6 
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up to the number of nucleotides present in a selected full-length sequence as described 
herein (e.g., see the Examples, Figures, Sequence Listing and claims), including all 
integer values falling within the above-described ranges. A "fragment" of a 
polynucleotide refers to any length polynucleotide molecule derived from a larger 

5 polynucleotide described herein (i.e., Y contiguous nucleotides, where X=Y as just 
described). Exemplary fragment lengths include, but are not limited to, at least about 
6 contiguous nucleotides, at least about 50 contiguous nucleotides, about 100 
contiguous nucleotides, about 200 contiguous nucleotides, about 250 contiguous 
nucleotides, about 500 contiguous nucleotides, or at least about 1000 contiguous 
10 nucleotides Or more, wherein such contiguous nucleotides are derived from a larger 
sequence of contiguous nucleotides. 

The purified polynucleotides and polynucleotides used in construction of 
expression cassettes of the present invention include the sequences disclosed herein as 
well as related polynucleotide sequences having sequence identity of approximately . 
15 80% to 100% and integer values therebetween. Typically the percent identities . 

between the sequences disclosed herein aiid the claimed sequences are at least about 
80-85%, preferably at least about 90-92%, more preferably at least about 95%, and 
most preferably at least about 98% sequence identity (including all integer values 
falling v/ithin these described ranges). These percent identities are, for example, 

0 relative to the claimed sequences, or other sequences of the present invention, when 
the sequences of the present invention are used as the query sequence. 

Alternatively, the degree of sequence similarity between polynucleotides can 
be detemiined by hybridization of polynucleotides under conditions that form stable 
duplexes betv/een homologous regions, followed by digestion with single-stranded- 

5 specific nuclease(s), and size determdnation of the digested fragments. Two DNA, or 
two polypepLide sequences are "substantially homologous" to each other when the 
sequences exhibit at least about 80-85%, preferably 85-90%, more preferably 90- 
95%, and most preferably 98-100% sequence identity to the reference sequence over a 
defined length of the molecules, as determined using the methods above. 

0 Substantially homologous also refers to sequences showing complete identity to th^' 

specified DNA or polypeptide sequence. DNA sequences that are substantially 

homologous can be identified in a Southern hybridization experiment under, for 

16 
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example, stringent conditions, as defined for that particular system. Defining 
appropriate hybridization conditions is within the skill of the art. See, e.g., Sambrook 
et al, supra\ DNA Cloning, supra] Nucleic Acid Hybridization, supra. 

Two nucleic acid fragments are considered to "selectively hybridize" as 

5 described herein. The degree of sequence identity between two nucleic acid 

molecules affects the efficiency and strength of hybridization events between such 
molecules. A partially identical nucleic acid sequence will at least partially inhibit a 
completely identical sequence from hybridizing to a target molecule. Inhibition of 
hybridization of the completely identical sequence can be assessed using 

0 hybridization assays that ai-e well known in the art (e.g.. Southern blot, Northern blot, 
solution hybridization, or the like, see Sambrook, et al., Molecular Cloning: A 
Laboratojy Manual, Second Edition, (1989) Cold Spring Harbor, N.Y.). Such assays 
can be conducted using varying degrees of selectivity, for example, using conditions 
varying from low to high stringency. If conditions of low stringency are employed, 

5 the absence of non-specific binding can be assessed using a secondary probe that 

lacks even a partial degree of sequence identity (for example, a probe having less than 
about 30% sequence identity with the target molecule), such that, in the absence of 
non-specific binding events, the secondary probe will not hybridize to the target. 

When utilizing a hybridization-based detection system, a nucleic acid probe is 

0 chosen that is complementary to a target nucleic acid sequence, and then by selection 
of appropriate conditions the probe and the taiget sequence "selectively hybridize,'* or 
bind, to each other to form a hybrid molecule. A nucleic acid molecule that is capable 
of hybridizing selectively to a target sequence under "moderately stringent" typically 
hybridizes under conditions that allow detection of a target nucleic acid sequence of at 

5 least about 10-14 nucleotides in length having at least approximately 70% sequence 
identity with the sequence of the selected nucleic acid probe. Strjugeat hybridization 
conditions typically allow detection of target nucleic acid sequences of at least about 
10-14 nucleotides in length having a sequence identity of greater than about 90-95% 
with the sequence of the selected nucleic acid probe. Hybridization conditions useful 

D for probe/target hybridization v/here the probe and target have a specific degree of*' 
sequence identity, can be determined as is known in the art (see, for example, Nucleic 
Acid Hybridization: A Practical Approach ^ editors B.D. Hames and SJ. Higgins, 

17 
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(1985) Oxibrd; Washington, DC; IRL Press). 

With respect to stringency conditions for hybridization, it is well known in the 

art that numerous equivalent conditions can be employed to establish a particular 

stringency by varying, for example, the following factors: the length and nature of 
5 probe and target sequences, base composition of the various sequences, 

concentrations of salts and other hybridization solution components, the presence or 

absence of blocking agents in the hybridization solutions (e.g., formamide, dextran 

sulfate, and polyethylene glycol), hybridization reaction temperature and time 

parameters, as well as, varying wash conditions. The selection of a particular set of 
10 hybridizatio'n conditions is selected following standard methods in the art (see, for 

example, Sambrook, et al, Molecular Cloning: A Laboratoi-y Manual Second 

Edition, (1989) Cold Spring Harbor, N.Y.)- 

A "vector" is capable of transferring gene sequences to target ceils. Typically, 

"vector construct," "expression vector," and "gene transfer vector," mean any nucleic 
15 acid construct capable of directing the expression of a gene of interest and which can 

transfer gene sequences to target cells. Thus, the term includes cloning, and 

expression vehicles, as well as integrating vectors. 

"Nucleic acid expression vector" or "expression cassette" refers to an 

assembly that is capable of directing the expression of a sequence or gene of interest, 
20 The nucleic acid'expression vector includes a promoter that is operably linked to the 

sequences or gene(s) of interest. Other control elements may be present as well. 

Expression cassettes described herein may be contained within a plasmid constmct. 

In addition to the components of the expression cassette, the plasmid construct may 

also include a bacterial origin of replication, one or more selectable markers, a signal 
25 which allows the plasmid construct to exist as single-stranded DNA (e.g,, a M13 

origin of replication), a multiple cloning site, and a "mammalian" origin of replication 

(e.g., a SV40 or adenovirus origin of replication). 

An "expression cassette" comprises any nucleic acid construct capable of 

directing the expression of a gene/coding sequence of interest. Such cassettes can be 
30 constructed into a "vector," "vector construct," "expression vector," or "gene transfer 

vector," in order to transfer the expression cassette into target cells. Thus, the term 

includes cloning and expression vehicles, as well as viral vectors. 

IS 
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A variety of "reporter genes" also referred to as "reporter sequences" and 
"marker sequences," i.e., genes or sequences the expression of which indicates the 
expression of polynucleotide sequences of interest to which the reporter gene or 
sequence is operably linked. Preferred are those reporter sequences that produce a 

. 5 protein product that is easily measured, preferably in a routine assay. Suitable reporter 
genes include, but are not limited to chloramphenicol acetyl transferase (CAT), light 
generating proteins (e.g., luc-encoded, lux-encoded, fluorescent proteins), and beta- 
galactosidase. Convenient assays include, but are not limited to calorimetric, 
fluorimetric and enzymatic assays. In one aspect, reporter genes may be employed 

10 that are expressed within the cell and whose extracellular products are directly 

measured in the intracellular medium, or in an extract of the intracellular medium of a ^ 
cultured cell line. This provides advantages over using a reporter gene whose product 
is secreted, since the rate and efficiency of the secretion introduces additional 
variables that may complicate interpretation of the assay. In a preferred embodiment, 

15 the reporter gene is a light generating protein. When using the light generating- 

reporter proteins described herein, expression can be evaluated accurately and non- 
invasively as described above (see, for example, Contag, P. R., et aL, (1998) Nature 
Med. 4:245-7; Contag, C, H., et al, (1997) Photochem Photobiol 66:523-31; Contag, 
C H„ et al, (1995) Mol Microbiol. 18:593-603). 

20 A "light generating protein" or "light-emitting protein" is a bioluminescent or 

fluorescent protein capable of producing light typically in the range of 200 nm to 
1 100 nm, preferably in the visible spectrum (i.e., between approximately 350 nm and 
800 nm). Bioluminescent proteins produce light through a chemical reaction . 
(typically requiring a substrate, energy source, and oxygen). Fluorescent proteins 

25 produce light through the absorption and re-emission of radiation (such as with green 
fluorescent protein). Examples of bioluminescent proteins include, but are not limited 
to, the following: "iuciferase," unless stated otherwise, includes procaryotic (e.g., 
bacterial lux-encoded) and eucaryotic (e.g., firefly luc-encoded) luciferases, as vveD as 
variants possessing varied or altered optical properties, such as luciferases that 

30 produce different colors of light (e.g., Kajiyama, N., and Nakano, E., Protein 

Engmeering 4(6):691-693 (1991)); and "photoproteins," for example, calcium 

activated ohotoproteins (e.g., Lewis, J.C., et al., Fresenius J. Anal Chein. 366(6- 

19 



wo 02/0X8305 



PCT/USU2/11770 



7):76U-768 (2000)). Examples of fluorescent proteins include, but are not limited to, 
green, yellow, cyan, blue, and red fluorescent proteins (e.g., Hadjantonakis, A.K., et 
al., Histochem, Cell Biol 115(l):49-58 (2001)). 

"Bio luminescent protein substrate" describes a substrate of a light-generating 
5 protein, e.g., luciferase enzyme, that generates an energetically decayed substrate 
(e.g., luciferin) and a photon of light typically with the addition of an energy source, 
such as ATP or FjVINH2, and oxygen. Examples of such substrates include, but are 
not limited to, decanal in the bacterial Iilx system, 4,5-diIiydro-2-(6-hydroxy-2- 
benzothia2olyl)-4-thia2olecarboxylic acid (or simply called luciferin) in the.Firefly 
10 luciferase {iuc) system, "panal" in the bioluminescent fungus Panellus stipticus 

system (Tetrahedron 44:1597-1602, 19SS) and N-iso-valeryl-S-aniinopropanoI in the 
earth worm Diplocardia longa system (Biochem. 15:1001-1004, 1976). In some 
systems, as described herein, aldehyde can be used as a substrate for the light- 
generating protein. 

15 "Light" is defined herein, unless stated otherwise, as electromagnetic radiation 

having a wavelength of between about 200 nm (e.g., for UV-C) and about 1 100 nm 
(e.g., infrared). The wavelength of visible light ranges between approximately 350 nm 
to approximately 800 nm (i.e., between about 3,500 angstroms and about 8,000 
angstroms). 

20 "Animal"' typically refers to a non-human animal, including, without 

limitation, farm animals such as cattle, sheep, pigs, goats and horses; domestic 
mammals such as dogs and cats; laboratory animals including ferrets, hares and 
rabbits, rodents, such as mice, rats, hamsters, gerbils, and guinea pigs; non-human 
primates, including chimpanzees. The term "animal" may also include, without 

25 limitation; birds, including domestic, wild and game birds such as chickens, turkeys 
and other gallinaceous birds, ducks, geese, and the like, as well as amphibians, fish, 
insects, reptiles, etc. The term does not denote a particular age. Thus, adult, 
embryonic, fetal, and newborn individuals are intended to be covered. 

A ''transgenic animal" refers to a genetically engineered animal or offspring of 

30 genetically engineered animals. A tratisgenic animal usually contains material froni at 
least one unrelated organism, such as from a viiiis, microorganism, plant, or other, 
animal. The term 'Chimeric animal" is used to refer to animals in which the 
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heterologous gene is found, or in which the heterologous gene is expressed in some 
but not all cells of.the animal. 

"Analyte" refers to any compound or substance whose effects {e.g., induction 
or repression of a specific promoter) can be evaluated using the test animals and 
5 methods of the present invention. Such analytes include, but are not limited to, 
chemical compounds, pharmaceutical compounds, polypeptides, peptides, 
polynucleotides, and polynucleotide analogs. Many organizations (e.g., the National 
Institutes of Health, pharmaceutical and chemical corporations) have large libraries of 
chemical or biological compounds from natural or synthetic processes, or 
10 fermentation broths or extracts. Such compounds/analytes can be employed in the 
practice of the present invention. 

The term ^'positive selection marker" refers to a gene encoding a product that 
enables only the cells that carry the gene to survive and/or grow under certain 
conditions. For example, plant and animal cells that express the introduced neomycin 
15 resistance (Neo'^) gene are resistant to the compound G418. Cells that do not carry the 
Neo*" gene marker are killed by G418. Other positive selection markers will be known 
to those of skill in the art. Typically, positive selection markers encode products that 
can be readily assayed. Thus, positive selection markers can be used to determine 
whether a particular DNA construct has been introduced into a cell, organ or tissue. 
0 ''Negative selection marker" refers to gene encoding a product that can be 

used to selectively kill and/or inhibit growth of cells under certain conditions. Non- 
limiting examples of negative selection inserts include a herpes simplex virus (HSV)- 
thymidine kinase (TK) gene. Cells containing an active HS Y-TK gene are incapable 
of growing in the presence of gangcylovir or similar agents. Thus, depending on the 
5 substrate, some gene products can act as either positive or negative selection markers. 
The term "homologous recombination" refers to the exchange of DNA 
fragments between two DNA molecules or chromatids at the site of essentially 
identical nucleotide sequences. It is understood that substantially homologous 
sequences can accommodate insertions, deletions, and substitutions in the nucleotide 
0 sequence. Thus, linear sequences of nucleotides can be essentially identical even if' 
soiiie of the nucleotide residues do not precisely correspond or align (see, above). 

A "knock-out" mutation refers to partial or complete loss of expression of at 

21 
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least a portion the target gene. Examples of knock-out mutations include, but are not 
limited to, gene-replacement by heterologous sequences, gene disruption by 
heterologous sequences, and deletion of essential elements of the gene (e.g., promoter 
region, portions of a coding sequence). A "knock-out" mutation is typically identified 
5 by the phenotype generated by the mutation. 

A "gene" as used in the context of the present invention is a sequence of 
nucleotides in a genetic nucleic acid (chromosome, plasmid, etc.) with which a 
genetic function is associated. A gene is a hereditary unit, for example of an 
organism, comprising a polynucleotide sequence (e.g., a DNA sequence for . 

10 mammals) that occupies a specific physical location (a "locus", "gene locus" or 

"genetic locus") within the genome of an organism. A gene can encode an expressed 
product, such as a polypeptide or a polynucleotide (e.g., tRNA). Alternatively, a gene 
may define a genomic location for a particular event/fiuiction, such as the binding of 
proteins and/or nucleic acids (e.g., phage attachment sites), wherein the gene does not 

15 encode an expressed product. Typically, a gene includes coding sequences, such as, 
polypeptide encoding sequences, and non-coding sequences, such as, transcription 
control elements (e.g., promoter sequences), poly-adenlyation sequences, 
transcriptional regulatory sequences (e.g., enhancer sequences). Many eucaryotic 
genes have "exons" (coding sequences) interrupted by "introns'' (non-coding 

20 sequences). In (Certain cases, a gene may share sequences with another gene(s) (e.g., 
overlapping genes). 

nnu^ ct., , . . .;i .1 „ , , , . V . r . ... ■ . 1 .1 . 

sequence that comprises the genetic locus corresponding to the gene, e.g., all 
regulatory and open-reading fi^ame coding sequences required for expression of a 

25 completely functional gene product as they are present in the wild-type genome of an 
0['g?s.\sm. The r.ative seqv::::-:c. of gene, can include, for exLimple, :r::r:Scriptional 
promoter sequences, translation enhancing sequences, introns, exons, and poly-A 
processing signal sites. It is noted that in the general population, wild-type genes may 
include multiple prevalent versions that contain alterations in sequence relative to 

30 each other and yet do not cause a discernible pathological effect. These variations fee 
designated ''polymorphisms" or '''allelic variations." 

By "replacement sequence" is meant a polynucleotide sequence that is 
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substituted for at least a portion of the native or wild-type sequence of a gene. 

"Linear vector" or "linearized vector/' is a vector having two ends. For 
example, circular vectors, such as plasmids, can be linearized by digestion with a 
restriction endonuclease that cuts at a single site in the plasmid. Preferably, the 
5 expression vectors described herein are linearized such that the ends are not within the 
sequences of interest. 

Before describing the present invention in detail, it is to be understood that this 
invention is not limited to paiticular formulations or method parameters as such may, 
of course, vary. It is also to be understood that the terminology used herein is for the 
10 purpose of describing particular embodiments of the invention only, and is not 
intended to be limiting. 

Although a number of methods and materials similar or equivalent to those 
described herein can be used in the practice of the present invention, the preferred 
materials and methods are described herein. 

15 

2. Modes of Carrying out tp±e invention 

Thi-oughout this application, various publications, patents, and published 
patent applications are referred to by an identifying citation. The disclosures of these 
publications, patents, and published patent specifications referenced in this 
20 application help to describe the state of the art to which this invention pertains. 

As used in this specification and the appended claims, the singular forms "a," 
"an" and "the" include plural references unless the content clearly dictates otherwise. 
Thus, for example, reference to "an expression construct" includes a mixture of two 
or more such agents. 

25 

2.1.0 General Overview 

In one aspect, the present invention relates to transcription conirol elements 
derived from cytochrome P450 genes (e.g.,Cyp3 Al 1 and CYP3 A4), expression 
cassettes which include those control elements, vector constructs, cells and transgenic 
30 animals containing the expression cassettes, and methods of using the cells and 
transgenic animals containing the expression cassettes, for example, as modeling, 
screening and/or test systems. Methods of using the control elements, expression 

23 
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cassettes, cells, and transgenic animals of the present invention include, but are not 
limited to, studies involving toxicity and drug metabolism, and methods for screening 
drug metabolism, safety and/or possible toxicity. Exemplary transcription control 
elements useful in the practice of the present invention include those derived from 
5 mouse Cyp3 Al 1 locus and those derived from the human CYP3A4 locus. 

Experiments performed in support of the present invention demonstrate that the 
effects of a compound on modulation of expression mediated by Cyp3Al 1 or 
CYP3A4 transcriptional control elements can be directly monitored in live animals to 
provide information about the effects of the compound, e.g., toxicity. 

10 In one' embodiment, the present invention relates to ( 1) transcription control 

elements (e.g., promoters) derived from the mouse Cyp3 Al 1 gene locus or from the 
human CYP3A4 gene locus; (2) expression cassettes comprising such transcription 
control elements operatively linked to genes encoding a gene product, such as, a 
reporter, a protein, polypeptide, hormone, ribozyme, or antisense RNA, 

15 (3) recombinant cells comprising such expression cassettes, (4) methods of screening 
for safety and/or possible toxicity using such cells (e.g., screening for toxicity or 
safety of compounds which modulated expression mediated by the transcription 
control elements of the present invention), (5) animals (e.g., transgenic or "liver 
push") comprising the aforementioned transcription control elements, expression 

20 cassettes, and/or vector constructs, (6) methods of monitoring safety and/or toxicity 
using such animals, and (7) methods of screening for safety and/or toxicity of 
compounds using such animals. 

A variety of transcription control elements are useful in the practice of the 
present invention, for example, transcription control elements derived from genes or 

25 gene loci associated with drug metabolism. Specific locations of selected 

transcriptional control elements within a defined polynucleotide sequence can be 

identified by methods known to those of skill in the art, e.g., sequence comparison, 

deletion analysis, and/or linker-insertion mutagenesis, in view of the teachings of the 

present specification. An exemplary transcription control element can be one that is 

I' 

30 associated with oxidative metabolism of drug, for instance the P450 superfamily of 
hemoproteins that metabolize a wide variety of endogenous and xenobiotics. 
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Particular embodiments of such transcription control elements include those 
associated with the mouse Cyp3 Al 1 gene and the human CYP3A4 gene. In this way, 
expression of the reporter sequence is induced in the transgenic animals of the present 
invention when, for example, after administration of a candidate drug, and safety 

5 and/or toxicity of the drug can be evaluated by non-invasive imaging methods using 
the whole animal. Various forms of the different embodiments of the invention, 
described herein, may be combined. 

Non-invasive imaging and/or detecting of light-emitting conjugates in 
mammalian subjects was described in U.S. Patent Nos. 5,650,135, and 6,217,847, by 

10 Contag, et ai., issued 22 July 1997, and April 17, 2001, respectively. This imaging 
technology can be used in the practice of the present invention in view of the 
teachings of the present specification. In the imaging method, the conjugates contain 
a biocompatible entity and a light-generating moiety. Biocompatible entities include, 
but are not limited to, small molecules such as cyclic organic molecules; 

15 macromolecules such as proteins; microorganisms such as viruses, bacteria, yeast and 
fungi; eucaryotic cells; all types of pathogens and pathogenic substances; and 
particles such as beads and liposomes. In another aspect, biocompatible entities may 
be all or some of the cells that constitute the mammalian subject being imaged, for 
example, cells carrying the expression cassettes of the present invention expressing a 

20 reporter sequence. 

Light-emitting capability is conferred on the biocompatible entities by the 
conjugation of a light-generating moiety. Such moieties include fluorescent 
molecules, fluorescent proteins, enzymatic reactions which give off photons, and 
luminescent substances, such as bioluminescent proteins. In the context of the present 

25 invention, light emitting capability is typically conferred on target cells by having at 
least one copy of a light-generating protein, e.g., a luciferase, present. In preferred 
embodiments, luciferase is operably linked to appropriate control elements that can 
facilitate expression of a polypeptide having luciferase activity. Substrates of 
luciferase can be endogenous to the cell or applied to the cell or system (e.g., injection 

30 into a transgenic mouse, having cells carrying a luciferase construct, of a suitable 
substrate for the luciferase, for example, luciferin). The conjugation may involve a 
chemical coupling step, genetic engineering of a fusion protein, or the transformation 
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of a cell, microorganism or animal to express a light-generating protein. 

Thus, in one aspect, the present invention relates to animal test systems and 

methods for toxicology studies of an analyte of interest. In the practice of the present 

invention, transgenic manunals or liver pushed animals are constructed where control 

5 elements, for example, a promoter or transcriptional regulatory sequence, of two or 

more stress-induced genes are operably linked to reporter gene coding sequences (for 

example, luciferase). An appropriate substrate for the reporter gene product is 

administered to the animal in addition to an analyte of interest. The order of 

administration of these two substances can be empirically determined for each analyte 

10 of interest, kiduction of expression mediated by any of the control elements is then 

evaluated by non-invasive imaging methods using the whole animal. 

Thus, in one aspect of the present invention, animals described herein can be 

used to evaluate the in vivo effects of high production volume (HPV) chemicals, for 

example, by examining the effects of HPVs on expression of toxicity related genes 

15 such as P450. To date there are approximately 3,000 HPV chemicals within the set of 

non-polymeric chemicals (polymeric chemicals tend to be poorly absorbed by 

organisms and thus generally have low toxicity). Before the present invention there 

has been no routine, effective way to evaluate toxicity of these chemicals in vivo, 

which takes into account toxicity of not only the chemical itself, but of metabolites 

20 thereof (e.g., breakdown products). 

Chemical producers and importers have been invited by the United States 

Environmental Protection x'^gency (EPA) to provide basic toxicity information on 

theii- high production volume (HPV) chemicals. HPV chemicals are chemicals 

produced in or imported to the United States in amounts over 1 million pounds per 

25 year. Each chemical companies participating in the voluntary program will make a 

comjuitment to identify chemicals that the company will adopt for testing. Following 

the guidelines established by EPA, participating companies will perform the 

following tasks: assessment of the adequacy of existing data; design and submission 

of test plans; provide test results as generated; and prepare summaries of the data 

30 characterizing each chemical. Currently, the voluntary program uses the same tests/' 

testing protocols, and basic information summary formats employed by the Screening 

Information Data Set (SIDS) program. SIDS is a cooperative, international effort to 

26 
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secure basic toxicity information on HPV chemicals worldwide. Accordingly, 
information prepared for the U.S. domestic program will be acceptable in the 
international effort. 

Of the approximately 3,000 chemicals that the U.S. imports or produces at 

5 more than 1 million Ibs./yr., a recent EPA analysis finds that 43% of these high 
production volume chemicals have no testing data on basic toxicity and only seven 
percent have a full set of basic test data (http://www.epa.gov/opptintr/chenirtk). This 
lack of test data compromises the public's right to know about the chemicals that are 
found in the environment, homes, workplaces, and products. 

10 There are six basic tests which have been interaatiohally agreed to for 

screening high production volume (HPV) chemicals for toxicity. The tests agreed to 
under the Organization for Economic Cooperation and Development's Screening 
Information Data Set (OECD/SIDS) program include the following: acute toxicity; 
chronic toxicity; developmental/reproductive toxicity; mutagenicity; ecotoxicity and 

15 environmental fate. Several of these tests rely on animal models where the animal 

must be sacrificed to obtain toxicity data! The transgenic animals described herein are 
useful for toxicity testing and avoid the need for a "death as the end-point" model. 
Accordingly, use of the transgenic animals of the present invention to evaluate 
toxicity will provide for a more humane means of toxicity testing. Further, because 

20 "death as the end-point" is not always necessary using transgenic animals carrying the 
reporter expression cassettes of the present invention, costs associated with toxicity 
testing in live animals can likely be reduced. 

The EPA's Chemical Hazard Data Availability Study found major gaps in the 
basic information that is readily available to the public. Most consumers assume that 

25 basic toxicity testing. is available and that all chemicals in commerce today are safe. A 
recent EPA study has found that this is not a prudent assumption. The EPA has 
reviewed the publicly available data on these chemicals and has learned that most of 
them may have never been tested to determine how toxic they are to humans or the 
environment. The EPA cannot begin to judge the hazards and risks of HPV consumer 

30 chemicals without basic information, and, in fact, substantially more detailed and 
exhaustive testing is needed to assess those high exposure chem.icals 
(http://www.epa.gov/opptLntr/chemrtk). It is clear that companies need io do more to 
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address this problem. 

SroS tests, do not fully measure a chemicaFs toxicity. The tests only provide a 
minimum set of information that can be used to determine the relative hazards of 
chemicals and to judge if additional testing is necessary. However, the transgenic 
5 animals of the present invention provide models for in vivo toxicity testing that can 
greatly expand the information available about the hazards of these chemicals and 
their metabolites. 

OSHA sets Permissible Exposure Limits (PELs) for hazardous chemicals in 
the workplace. It sttvos reasonable to expect that chemicals with PELs have been 

10 thoroughly tested at least for human health effects. However, even the high volume 
chemicals with PELs have significant data gaps from tlie human health portion of the 
basic screening test set. Only 53% of these high volume chemicals with PELs have 
basic screening tests for all four of the human health endpoints. In contrast, only 5% 
of the non-PEL HPV chemicals had all four health effects tests and 49% had no health 

15 test data available (http://www.epa.gov/opptintr/chemrtk). Thus, the bulk of HPV 

chemicals without PELs lack even the minimal data needed to support development of 
a PEL value to protect workers. The transgenic animals of the present invention 
provide means for testing toxicity that provide specific, in vivo data concerning 
toxicity not only of the chemicals themselves, but of metabolites of these chemicals as 

20 well. 

Finally, chemicals contained in consumer products are a major concern due to 
the likelihood of their exposure to children, as well as other sensitive populations 
(e.g., pregnant women and health-compromised individuals). Although the chemical 
industry has completed basic testing for more of these chemicals than is the case for 
25 other HPV chennicals, a more complete evaluation of in vivo toxicity using the 
transgenic animals of the present invention would be desirable. Given the great 
exposure potential of consumer products, significantly greater amounts of testing are 
needed to assess the risks of such chemicals. The transgenic animals described herein 
help to meet this need. 

30 In a related aspect of the present invention, lo.e transgenic animals described 

herein can be used to evaluate the in vivo effects of endocrine disruptors (ED). EDs 

are typically chemicals that interfere with the normal functioning of the endocrine 

28 
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system (including, for example, many pesticides and fertilizers). The increasing need 
for evaluation of ^iPV and potential endocrine disruptors, both in view of public 
interest and mandates for testing from the U.S. Federal Government, are likely to be 
met by the transgenic animals and accompanying compound screening methods of the 
5 present invention. 

Several classes of stress-related genes, and the promoters/control elements 
thereof, are described in more detail below. 

2.2.0 Promoters 

The^expression cassettes, vectors, cells and transgenic animals described 

10 herein contain a sequence encoding a detectable gene product, e.g., a luciferase gene, 
operably linked to a transcription control element, e.g., a promoter. The promoter 
may be from the same species as the transgenic animal (e.g., mouse promoter used in 
construct to make transgenic mouse) or from a different species {e.g., human 
promoter used in construct to make transgenic mouse). The promoter can be derived 

15 from any gene of interest. In one embodiment of the present invention, the promoter 
is derived from a gene whose expression is induced during oxidative metabolism, for 
example clearing of a dmg via the liver. Thus, when a drug is administered to a 
transgenic animal carrying a vector construct of the present invention, the promoter is 
induced and the qnimal expresses luciferase, which can then be monitored in vivo, 

20 Exemplary transcription control elements (e.g., promoters) for use in the 

present invention include, but are not limited to, promoters derived from the P450- 
related genes and gene families. In humans, 40 different P450 genes (designated 
"CYP" genes) and 13 pseudogenes are currently known. Tliose genes are classified 
into 16 families based on amino acid sequence similarity. Families 1, 2 and 3 are 

25 involved in drug metabolism, and over 90% of drug oxidation in humans is attributed 
to only 6 CYP genes (1A2, 2C9, 2C19, 2D6, 2E1 and 3A4). 1 2C9. 2D6 and 3A4 
contribute the most, with CYP3 A4 accounting for 50-60% of the activity. Mouse 
Cyp3An is described, for example, in Yanagjmoto T. et al. (1997) Aichives of 
Biochemistry and Biophysics 340(2): 215-8 and Toide K. et al. (1997) Archives of, 

30 Biochemistry and Biophysics 33S(l):43-49. Human CYP3A4 is described, for 

example, m Hashimoto H. et al. (1993). Eur J Biochem. 21S(2):5S5-95: Goodwin B. 
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et al. (1999).Mol Pharmacol. 56(6): 1329-39; and Bertilsson G. et al. (199S) Proc Natl 
Acad Sci USA. 95(21):12208-13. Exemplified herein are transcription control 
elements derived mouse Cyp3 Al 1 as well as transcription elements derived from 
human CYP3A4. 

5 As one of skill in the art will appreciate in view of the teachings of the present 

specification, transcription control element sequences can be derived and isolated 
from, e.g., genomic sequences, using method known in the art in view of the 
teachings herein. For example, the transcription control element sequences of 
Cyp3Allwere isolated and sequenced as described in Example 1 below. 

10 Similaily, a sequence that confers liver-specific expression was obtained from 

the CYP3A4 gene. It has been suggested that HIF4, C0UP-TF/HIF4, GRE, 
Rifampicin, Dex responsive elements are located in the 10.5 kb promoter region of 
CYP3A4 (Goodwin et al, supra). However, prior to the present application, no 
regulatory elements had been described in the distal 2.5 Kb region of this locus. As 

15 described herein, when the activity of the 10.5Kb and the 13 Kb promoters were 

compared in liver-push experiments and transgenic animals, thelSKb promoter was 
found to mediate much higher expression in the livers than the 10.5 Kb promoter. 
Indeed, liver-push experiments showed that the 13 Kb promoter activity was 25 fold 
higher than activity observed using the 10.5 Kb promoter. Furthermore, luciferase 

20 expression in transgenic mice containing the 13 Kb promoter showed luciferase 

reporter is highest in liver, while luciferase expression in 10.4 kb-transgenic animals 
is high in intestines. Thus, data obtained from liver push experiments and transgenic 
animals demonstrate that the -2.5 Kb distal fragment of the CYP3A4 promoter 
dramatically enhances liver-specific gene expression. Potential transcription factor 

25 binding sites in this 2.5 Kb include: four potential HNF-3b sites (in opposite 
orientation) and two HNF-3b sites (in dii'ect orientation). 

Another exemplary method of isolating promoter sequences employs a 
GenomeWaiker® kit, commercially available from Clontech (Palo .AJto, CA), and 
described on page 27 of the 1997-1998 Clontech catalog. 

30 



30 
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2.2.1 Mouse Cyp3All and Human CYP3A4 Transcription Control 
Elements Sequences 

The subject nucleic acids of the present invention (e.g., as described in 
5 Example 1) find a wide variety of applications including use as hybridization probes, 
PGR primers, expression cassettes useful for compound screening, detecting the 
presence of CypSAll or CYP3A4 genes or variants thereof, detecting the presence of 
gene transcripts, detecting or amplifying nucleic acids encoding additional CypSAll 
or CYP3A4 promoter sequences or homologues thereof (as well as, structural 

10 analogs), ancj in a variety of screening assays. 

The.present invention provides efficient methods for determining the toxicity 
of pharmacological agents which are active at the level of CypSAl 1 or CYP3A4 gene 
transcription. A wide variety of assays for transcriptional expression can be used 
based on the teaching of the present specification, including, but not limited.to, cell- 

15 based transcription assays, screening in vivo in transgenic animals, and promoter- 
protein binding assays. For example, the disclosed luciferase reporter constructs are 
used to transfect cells for cell-based transcription assays. For example, primary 
endothelial cells are plated onto microtiter plates and used to screen libraries of 
candidate agents for compounds which modulate the transcriptional regulation of the 

20 Cyp3 Al 1 or CYP3A4 gene promoters, as monitored by luciferase expression (See 
Examples below). 

As noted above, the present invention relates to a recombinant nucleic acid 
molecule comprising transcription control elements derived from a mouse Cyp3All 
gene locus or from a human CYP3 A4 locus. In particular, recombinant nucleic acid 

25 molecules comprising SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14 and SEQ ID 
NO: 15, as well as fragments thereof, are described. The fragments hcive 
approximately 80% to 100?^, and integer values therebetween, sequence identity to 
sequences disclosed, at least 80-85%, preferably 85-90%, more preferably 90-95%, 
and most preferably 98-100% sequence identity to the reference sequence (i.e., the 

30 sequences of the present invention). The present invention may also include a nucleic 
acid sequence substantially complementary to said polynucjeotide seque.nces, or 
fragments thereof, as well as, a nucleic acid sequence that specifically hybridizes to 



wo 1)2/088305 



PCT/US02/11770 



said polynucleotide sequences or fragments thereof. 

The invention includes further transcription control element sequences (e.g., 
promoter sequences) identified based on the teachings of the present specification 
(including, but not limited to, sequence information and isolation methods, e.g., 
5 Example 1). 

The nucleic acid molecules of this invention are useful for producing 
transfected cells, liver push animals and transgenic animals that are themselves usefiil 
in a variety of applications, and for screening for safety and/or possible toxicity of 
compounds that modulate P450-mediated metabolism (see Examples 2-4). • 

10 Tho'^e skilled in the art can practice the invention by following the guidance of 

the specification.supplemented with standard procedures of molecular biology for the 
isolation and characterization of the Cyp3All and CYP3A4 transcription control 
elements, theii' transfection into host cells, and expression of heterologous DNA 
operably linked to said Cyp3 Al 1 or CYP3 A4 promoters. For example, DNA is 

15 commonly transferred or introduced into recipient mammal cells by calcium 

phosphate-mediated gene transfer, electroporation, lipofection, viral infection, and the 
like. General methods and vectors for gene transfer and expression may be found, for 
example, in M. Kriegler, Gene Transfer and Expression: A Laboratory Manual, 
Stockton Press (1990). Direct gene transfer to cells in vivo can be achieved, for 

20 example, by the use of modified vii'al vectors, including, but not limited to, 

retroviruses, adenoviruses, adeno-associated viruses and heipes viruses, liposomes, 
and direct injection of DNA into certain cell types. In this manner, recombinant 
expression vectors and recombinant cells containing the Cyp3All and CYP3A4 
transcription control elements of the present invention operably linked to a desired 

25 heterologous gene can be delivered to specific target cells in vivo. See, e.g., Wilson, 
Nature, 365: 691-692 (1993); Plautz et al, Annals NY Acad. Sci., 716: 144-153 
(1994); Farliood et al, Annals NY Acad. Sci., 716: 23o4 (1994) and Hyde et al 
Nature, 362: 250-255 (1993). Furthermore, cells may be transformed ex vivo and 
introduced directly at localized sites by injection, e.g., intra-articular, intracutaneous, 

30 intramuscular and the like. In addition, recombinant expression vectors can be ' 
delivered vja a liver push to animals, for example by intravenous injection (see, also, 
Experimental Materials and Methods, below). 
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Cloning and characterization of the Cyp3Al 1-locus-derived transcription 
control elements and the CYP3A4-locus-derived transcription control elements are 
described in Example 1, below. Cloning and characterization of transcription control 
elements derived from the CYP3 A4 locus are also described in the Examples below. 
5 Characterization of some regions of the 5' non-coding regions of the mouse Cyp3 Al 1 
locus and the human CYP3A4 locus is presented in Example 5 (see also, Figures 18 
and 19). 

The present invention includes a polynucleotide effective to promote 
transcription of an operably linked heterologous sequence, said polynucleotide 

10 derived fro A the 5' non-coding region of the mouse Cyp3Al 1 gene. One aspect of 
the present invention comprises the approximately 13 kb sequence (SEQ ID NO: 12) 
and fragments thereof, in particular, fragments capable of functioning as transcription 
promoters and/or transcription regulatory sequences. 

In one embodiment, a transcription control element of the present invention 

15 includes a polynucleotide derived from the mouse Cyp3Al 1 gene comprising a 

polynucleotide sequence having 90% or greater identity to nucleotides 1-11,002 of 
SEQ ID NO: 12. One aspect of the present invention comprises the approximately 9.3 
kb sequence (SEQ ID NO: 13) and fragments thereof, in particular, fragments capable 
of functioning as transcription promoters and/or transcription regulatory sequences. 

20 In one embodiment, a transcription control element of the present invention includes a 
polynucleotide derived from the mouse Cyp3 Al 1 gene comprising a polynucleotide 
sequence having 90% or greater identity to SEQ ID NO: 13. In another embodiment, a 
transcription control element of the present invention includes a polynucleotide 
derived from the mouse Cyp3Al 1 gene, said polynucleotide comprising a 

25 polynucleotide having 90% identity or greater to nucleotides 5104-6218 of SEQ ID 
NO: 13. In another embodiment, a transcription control element of the present 
invention includes a polynucleotide derived from the mouse Cyp3Al 1 gene, said 
polynucleotide comprising a polynucleotide having 90% identity or greater to 
nucleotides 6792-9330 of SEQ ID NO: 13. In yet another embodiment, a transcription 

30 control element of the present invention includes a polynucleotide derived from th6 
mouse Cyp3Al 1 gene; said polynucleotide comprising a first polynucleotide having 
90% identity or greater to nucleotides 5104-6218 of SEQ ID N0:13 and a second • 
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polynucleotide having 90% identity or greater to nucleotides 6792-9330 of SEQ ID 
N0:I3. 

The polynucleotides of the present invention, e.g., a polynucleotide 
comprising the 9.3 kb sequence (SEQ ID NO: 13), or polynucleotides comprising 
5 fragments thereof may also be associated with a basal promoter in order to confer 
certain regulatory characteristics on the basal promoter (a basal promoter may, for 
example, comprise a mdnimum unit necessary to promote transcription, e.g., a TATA 
box). 

The present invention includes, but is not limited to, isolated polynucleotides 

10 (for example, those just described), methods of use of such polynucleotides, vectors 
comprising such polynucleotides, expression cassettes comprising such 
polynucleotides, vectors comprising such polynucleotides, recombinant cells 
comprising such polynucleotides, liver-push non-human animals comprising such 
polynucleotides, and transgenic, non-human animals comprising such 

15 polynucleotides. In one embodiment the present invention includes a transgenic, non- 
human animal (e.g., a rat or a mouse), comprising a 3A1 1-derived poKnnucleotide 
operably linked to a reporter gene (e.g., a light-generating protein). 

The present invention includes a polynucleotide effective to promote 
transcription of an operably linked heterologous sequence, said polynucleotide 

20 derived from the 5' non-coding region of the human CYP3A4 gene. One aspect of 
the present invention comprises the approximately 13 kb sequence (SEQ ID NO: 14) 
and fragments thereof, in particular, fragments capable of functioning as transcription 
promoters and/or transcription regulatory sequences. One such exemplary fragment is 
identified by SEQ ID NO: 15. In one embodiment, a transcription control element of 

25 the present invention includes a polynucleotide derived from the human CYP3 A4 
gene comprising a polynucleotide sequence having 90% or greater idendty to 
micleotides M 3,032 of SEQ ID NO: 14. 

In one embodiment, a transcription control element of the present invention 
includes a polynucleotide derived from the human C^T3 A4 gene comprising a 

30 polynucleotide sequence having 90% or greater identity to SEQ ID NO: 14. In another 
embodiment, a transcription control element of the present invention mcludes a 
polynucleotide derived from the human CYP3A4 gene, said polynucleoiide 
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comprising a polynucleotide having 90% identity or greater to nucleotides 1290-2446 
of SEQ ED NO: 14. In another embodiment, a transcription control element of the 
present invention includes a polynucleotide derived from the human CYP3A4 gene, 
said polynucleotide comprising a polynucleotide having 90% identity or greater to 

5 nucleotides 2758-4 II 1 of SEQ ID NO: 14. In another embodiment, a transcription 
control element of the present invention includes a polynucleotide derived from the 
human CYP3 A4 gene, said polynucleotide comprising a polynucleotide having 90% 
identity or greater to nucleotides 4424-6010 of SEQ ID NO: 14. In another 
embodiment, a transcription control element of the present invention includes a 

10 polynucleotMe derived from the human CYP3 A4 gene, said polynucleotide 

comprising a polynucleotide having 90% identity or greater to nucleotides 6317-9099 
of SEQ ID NO: 14. In another embodiment, a transcription control element of the 
present invention includes a polynucleotide derived from the human CYP3A4 gene, 
said polynucleotide comprising a polynucleotide having 90% identity or greater to 

15 nucleotides 9401-12998 ofSEQ ID NO: 14. In yet another embodiment, a 

transcription control element of the present invention includes a polynucleotide 
derived from the human CYP3A4 gene, said polynucleotide comprising a first 
polynucleotide having 90% identity or greater to nucleotides 1290-2446 of SEQ ID 
NO: 14, a second polynucleotide having 90% identity or greater to nucleotides 2758- 

20 4111 of SEQ ID N0:14, a third polynucleotide having 90% identity or greater to 
nucleotides 4424-6010 of SEQ ID N0:14, a fourth polynucleotide having 90% 
identity or greater to nucleotides 63 17-9099 of SEQ TD NO: 14, and a fifth 
polynucleotide having 90% identity or greater to nucleotides 9401-12998 of SEQ ID 
N0:14. 

25 The polynucleotides of the present invention, e.g., a polynucleotide 

comprising the approximately 13 kb sequence (SEQ TD NO: 14), or polynucleotides 
comprising fragments thereof may also be associated with a basal promoter in order to 
confer certain regulatory characteristics on the basal promoter (a basal promoter may, 
for example, comprise a minimum unit necessary to promote transcription, e.g., a 

30 TATA box). 

The present invention includes, but is not limited to, isolated polynucleotides 
(for example, those just described), methods of use of such polynucleotides, vectors 
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comprising such polynucleotides, expression cassettes coniprising such 
polynucleotides, vectors comprising such polynucleotides, recombinant cells 
comprising such polynucleotides, liver-push non-human animals comprising such 
polynucleotides, and transgenic, non-human animals comprising such 
5 polynucleotides. 

A preferred embodiment of one aspect of the present invention, includes a 
transgenic, non-human animal that comprises a transcription control element derived 
from the human CYP3A4 gene (e.g., a polynucleotide sequence having 90% or 
greater identity to nucleotides 1-13,032 of SEQ ID NO: 14) operably linked to a 

10 heterologous'sequence (e.g., encoding a light-generating protein, for example, 

luciferase). In one embodiment, the transgenic, non-human animal does not comprise 
a polynucleotide encoding hPXR (a human rifampicin co-receptor), that is, the 
transgenic, non-human animal does not express a f-unctional human PXR protein. For 
example, a transgenic rodent (e.g., a mouse or rat) has been generated that comprises 

15 a transcription control element derived from the human CYP3 A4 gene, but the animal 
does not express the human rifampicin co-receptor (see Example 6). 

2.3.0 Expression Cassettes and Vectors 

The expression cassettes described herein may typically include the following 
components: (1) a polynucleotide encoding a reporter gene, such as a sequence 

20 encoding a light generating protein, (2) a transcription control element operably 

linked to the reporter gene sequence, wherein the control element is heterologous to 
the coding sequences of the light generating protein (e.g., the CypSAll and CYP3A4 
sequences of the present invention). Transcription control elements derived from the 
sequences provided herein may be associated with, for example, a basal transcription 

25 promoter to confer regulation provided by such control elements on such a basal 

transcription promoter. Exemplary expression constructs are described in Example 1. 

The present invention also includes providing such expression cassettes in 
vectors, comprising, for example, a suitable vector backbone and optionally a 
sequence encoding a selection marker e.g., a posjtive or negative selection marker. , 

30 Suitable vector backbones generally include an Fl origin of replication; a colEl 
plasmid-denved ongin of replication; poJyadenylation sequence(s); sequences 

36' 
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encoding antibiotic resistance (e.g., ampicillin resistance) and other regulatory or 
control elements,. Non-limiting examples of appropriate backbones include: 
pBluescriptSK (Stratagene, La Jolla, CA); pBluescriptKS (Stratagene, La Jolla, CA) 
and other commercially available vectors. 
5 A variety of reporter genes may be used in the practice of the present 

invention. Preferred are those that produce a protein product which is easily 
measured in a routine assay. Suitable reporter genes include, but are not limited to 
chloramphenicol acetyl transferase (CAT), light generating proteins (e.g., luciferase), 
and beta-galactosidase. Convenient assays include, but are not limited to 

10 calorimetrici fluorimetric and enzymatic assays. In one aspect, reporter genes may be 
employed that are expressed within the cell and whose extracellular products are 
dii-ectly measured in the intracellular medium, or in an extract of the intracellular 
medium of a cultured cell line. This provides advantages over using a reporter gene 
whose product is secreted, since the rate and efficiency of the secretion introduces 

15 additional variables that may complicate interpretation of the assay. In a preferred 
embodiment, the reporter gene is a light generating protein. When using the light 
generating reporter proteins described herein, expression can be evaluated accurately * 
and non-invasively as described above (see, for example, Contag, P. R., et al., (1998) 
Nature Med. 4:245-7; Contag, C. R, et al., (1997) Photochem Photobioi. 66:523-31; 

20 Contag, C H., et-al, (1995) Mol Microbiol. 18:593-603). 

In one aspect of the invention, the light generating is luciferase. Luciferase 
coding sequences useful in the practice of the present invention include sequences 
obtained from ha genes (procaryotic genes encoding a luciferase activity) and liic 
genes (eucaryotic genes encoding a luciferase activity). A variety of luciferase 

25 encoding genes have been identified including, but not limited to, the following: B. A, 
Sherf and K.V. Wood, U.S. Patent No. 5,670,356, issued 23 September 1997; 
Kazanii, J., et al, U.S. Patent No. 5,604,123, issued 18 February 1997; S. Zenno, et 
al, U.S. Patent No. 5,618,722; K.V. Wood, U.S. Patent No. 5,650,289. issued 22 July 
1997; K.V. Wood, U.S. Patent No. 5,641,641, issued 24 June 1997; N. Kajiyama and 

30 E. Nakano, U.S. Patent No. 5.229,285, issued 20 .luly 1993; M.J. Coi-niier and W.W. 
Lorenz, U.S. Patent No. 5,292,658, issued 8 March 1994; M.J. Cormier and W.W. 
Lorenz, U.S. Patent No. 5/HS,155, is.sued 23 May ] 995; de Wet, J.R., et al, Molec. 
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Cell. BioL 7:725-737, 1987; Tatsunii, H.N., et al, Biochim, Biophys, Acta 1 131:161- 
165, 1992; and Wood, K.V., et al, Science 244:700-702, 1989. Another group of 
bioluminescent proteins includes light-generating proteins of the aequorin fanaily 
(Prasher, D. C, et al., Biochem. 26:1326-1332 (1987)). Luciferases, as well as 
5 aequorin-like molecules, requii*e a source of energy, such as ATP, NAD(P)H, and the 
like, and a substrate, such as luciferin or 
coelentrizine and oxygen. 

Wild-type firefly luciferases typically have eniission maxima at about 550 nm. 
Numerous variants with distinct emission maxima have also been studied. For 

10 example, Ka^iyama and Nakano (Protein Eng, 4(6):691-693, 1991; U.S. Patent No. 
5,330,906, issued 19 July 1994) teach five variant fii*efly luciferases generated by 
single amino acid changes to the Luciola cniciata luciferase coding sequence. The 
variants have emission peaks of 558 nm, 595 nm, 607 nm, 609 nm and 612 nm. A 
yellow-green luciferase with an ennission peak of about 540 nm is commercially 

15 available from Promega, Madison, WI under the name pGL3. A red luciferase with 
an emission peak of about 610 nm is described, for example, in Contag et al. (1998) 
Nat. Med, 4:245-247 and Kajiyama et al. (1991) Port Eng. 4:691-693. The coding 
sequence of a luciferase derived from Renilla muelleri has also been described 
(mRNA, GENBANK Accession No. AY015988, protein Accession AAG54094). 

20' In another aspect of the present invention, the light-generating protein is a 

fluorescent protein, for example, blue, cyan, green, yellow, and red fluorescent 
proteins. 

Several light-generating protein coding sequences are conmiercially available, 
including, but not limited to, the following. Clontech (Palo Alto, CA) provides 

25 coding sequences for luciferase and a variety of fluorescent proteins, including, blue, • 
cyan, green, yellow, and red fluorescent proteins. Enhanced green fluorescent protein 
(EGFP) variants are well expressed in mammalian systems and tend to exhibit 
brighter fluorescence than wild-type GFP. Enhanced fluorescent proteins include 
enhanced green fluorescent protein (EGFP), enlianced cyan-fluorescent protein 

30 (ECFP), and enhanced yellow fluorescent protein (EYFP). Further, Clontech 

provides destabilized enhanced fluorescent proteins (dEFP) variants that feature rapid 
turn over rates. The shorter half life of the dEFP variants makes them useful in kinetic 
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Studies and as quantitative reporters. DsRed coding sequences are available from 
Clontech (http://www.clontech.com/techinto/vectors/vectorsD/text/pDsRed.txt). 
DsRed is a red fluorescent protein useful in expression studies. Further, 
Fradkov,A.F., et. al., described a novel fluorescent protein from Discosoraa coral and 

5 its mutants which possesses a unique far-red fluorescence (FEBS Lett. 479 (3), 127- 
130 (2000)) (mRN A sequence, GENB ANK Accession No. AF2727 1 1 , protein 
sequence, GENBANK Accession No. AAG16224). Promega (Madison, WI) also 
provides coding sequences for fire fly luciferase (for example, as contained in the 
pGL3 vectors). Further, coding sequences for a number of fluorescent proteins are - 

10 available froiii GENBANK, for example, accession numbers AY015995, AF322221 , 
AF080431, AF292560, AF292559, AF29255S, AF292557, AF139645, U47298, 
U47297, AY01598S, AY015994, and AF292556. 

Modified lux coding sequences have also been described, e.g., WO 01/18195, 
published 15 March 2001, Xenogen Corporation. In addition, further light generating 

15 systems may be employed, for example, when evaluating expression in cells. Such 
systems include, but are not limited to. Luminescent beta-galactosidase Genetic 
Reporter System (Clontech). 

Positive selection markers include any gene which a product that can be 
readily assayed. Examples include, but are not limited to, an HPRT gene (Littlefield, 

20 J. W., Science 145:709-710 (1964)), a xanthine-guanine phosphoribosyltransferase 
(GPT) gene, or an adenosine phosphoribosyltransferase (APRT) gene (Sambrook et 
al., supra), a thymidine kinase gene (i.e. *TK") and especially the TK gene of the 
herpes simplex virus (Giphart-Gassler, M. et al., Mutat. Res. 214:223-232 (1989)), a 
nptll gene (Thomas, K. R. et al, Cell 51:503-512 (1987); Mansour, S. L. et al., Nature 

25 336:348-352 (1988)), or other genes which confer resistance to amino acid or 

nucleoside analogues, or antibiotics, etc., for example, gene sequences which encode 
enzymes such as dihydrofolate reductase (DHFR) enzyme, adenosine deaminase 
(ADA), asparagine synthetase (AS), hygromycin B phosphotransferase, or a CAD 
enzyme (carbamy] phosphate synthetase, aspartate transcarbamylase, and 

30 dihydroorotasc). Additioj) of the appropriate, substrate of the positive selection 

marker can be used to determine if the product of the positive selection marker is 

expressed, for example cells which do not express the positive selection marker nptll, 
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are killed when exposed to the substrate G418 (Gibco BRL Life Technology, 
Gaithersburg, MP). 

The vector typically contains insertion sites for inserting polynucleotide 
sequences of interest, e.g., the Cyp3 AI 1 and CYP3A4 sequences of the present 
5 invention. These insertion sites are preferably included such that there are two sites, 
one site on either side of the sequences encoding the positive selection marker, 
luciferase and the promoter. Insertion sites are, for example, restriction endonuclease 
recognition sites, and can, for example, represent unique restriction sites. In this way, 
the vector can be digested with the appropriate enzymes and the sequences of interest 
10 ligated into the vector. 

Optionally, the vector construct can contain a polynucleotide encoding a 
negative selection marker. Suitable negative selection markers include, but ai-e not 
limited to, HSV-tk (see, e.g., Majzoub et al. (1996) New Engl 7. Med. 334:904-907 
and U.S. Patent No. 5,464,764), as well as genes encoding various toxins including 
15 the diphtheria toxin, the tetanus toxin, the cholera toxin and the pertussis toxin. A 
further negative selection marker gene is the hypoxanthine-guanine phosphoribosyl 
transferase (HPRT) gene for negative selection in 6-thioguanine. 

Exemplary promoters for use in the practice of the present invention are 
described above. 

20 Vector Construction: The vectors described herein can be constiucted 

utilizing methodologies known in the art of molecular biology (see, for example, 
Ausubel or Maniatis) in view of the teachings of the specification. As described 
above, the vector constructs containing the expression cassettes are assembled by 
inserting the desired components into a suitable vector backbone, for example, 

25 (1) polynucleotides encoding a reporter protein, such as a light-generating protein, 

e.g., a luciferase gene, operably linked to a transcription control elemenL(s) of interest; 
(2) a sequence encoding a positive selection marker; and, optionally (3) a sequence 
encoding a negative selection marker. In addition, the vector construct contains 
insertion sites such that additional sequences of interest can be readily inserted to 

30 flank the sequence encoding positive selection marker and luciferase-encoding 
sequence. 
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A preferred method of obtaining polynucleotides, suitable regulatory 
sequences {e.g., promoters) is PGR. General procedures for PGR as taught in 
iMacPherson et aL, PGR: A PRACTICAL Approach, (IRL Press at Oxford University 
Press, (1991)). PGR conditions for each application reaction may be empirically 
5 determined. A number of parameters influence the success of a reaction. Among 
these parameters are annealing temperature and tune, extension time, Mg2+ and ATP 
concentration, pH, and the relative concentration of primers, templates and 
deoxyribonucleotides. Exemplary primers are described below in the Examples. 
After amplification, the resulting fragments can be detected by agarose gel 
10 electrophoresis followed by visualization with ethidium bromide staining and 
ultraviolet illumination. 

In one embodiment, PGR can be used to amplify fragments from genomic 
libraries. Many genomic libraries are commercially available. Alternatively, libraries 
can be produced by any method known in the art. Preferably, the organism(s) from 
15 which the DNA is has no discernible disease or phenotypic effects. This isolated 

DNA may be obtained from any cell source or body fluid (e.g., ES cells, liver, kidney, 
blood ceUs. buccal cells, cerviovaginal cells, epithelial cells from urine, fetal cells, or 
any cells present in tissue obtained by biopsy, urine, blood, cerebrospinal fluid (CSF), 
and tissue exudates at the site of infection or inflammation). DNA is extracted from 
20 the ceUs or body fluid using known methods of cell lysis and DNA purification. The 
purified DNA is then introduced into a suitable expression system, for example a 
lambda phage. Another method for obtaining polynucleotides, for example, short, 
random nucleotide sequences, is by enzymatic digestion. 

Polynucleotides are inserted into vector backbones using methods known in 
25 the art. For example, insert and vector DNA can be contacted, under suitable 

conditions, with a restriction enzyme to create complementary or blunt ends on each 
molecule that can pak" with each other and be joined with a ligase. Alternatively, 
synthetic nucleic acid linkers can be ligated to the termini of a polynucleotide. These 
synthetic linkers can contain nucleic acid sequences that correspond to a particular 
restrictioji site in the vector DNA. Othej- means are known and, in viev.- of the 
teachings herein, can be used. 

The vector backbone may comprise components functional m more than one 
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selected organism in order to provide a shuttle vector, for example, a bacterial origin 
of replication and a eucaryotic promoter. Alternately, the vector backbone may 
comprise an integrating vector, i.e., a vector that is used for random or site-directed 
integration into a target genome. 
5 The final constructs can be used immediately (e.g., for introduction into ES 

cells or for liver-push assays), or stored frozen (e.g., at -20°C) until use. In some 
embodiments, the constructs are linearized prior to use, for example by digestion with 
suitable restriction endonucleases. 

2.4.0 Livex- Push Animals 

10 The expression cassettes of the present invention may be introduced (extra- 

genomically) into an animal in order to practice the methods described lierein. High 
levels of foreign gene e.xpression have been obtained in muscle and liver via direct 
injection of naked plasmid DNA. In addition, high levels of expression can also be 
achieved by direct, intravascular adniinstration of naked plasmid DNA into the 

1 5 vessels supplying the liver or muscle. See, Wolff et al. (1990) Science 247: 1465- . 
1468; Budker et al. (1996) Gene Ther 3:593-598; Budker et al. (1998) Gene Ther 
5:272-276; Zhang et al. (1997) Human Gene Ther 8:1763-1772. Recently, Zhang et 
al. (1999) Human Gene Ther 10:1735-1737 reported that high levels of foreign gene 
expression was seen in hepatocytes following tail vein injections of naked plasmid 

20 DNA. 

Thus, in a preferred embodiment, the expression cassettes described herein are 
injected intravenously (e.g., into the tail vein of a mouse) in amounts, volumes and 
durations that are sufficient to acliieve expression in hepatocytes. Determining such 
amounts and volumes is within the purview of one of skill in the art. For example, the 
25 volume of DNA injection is preferably relatively large, for example between about 1 
to 10 mL, more preferably between about 1 to 5 mL, even more preferably between 
about 1 to 3 mL, and most preferably around 2.5 mL. The DNA may be administered 
in an aqueous solution or in any pharmaceutically acceptable vehicle such as Ringer's 
Solution. Other acceptable vehicles are known to those of skill in the art and are 
0 described, for example, in Remington's, supra. The amount of DNA can be similarly 
determined and is preferably between about 5-1000 ^ig, more preferably between 



wo 02/088305 



PCT/US02/1I77() 



10 



15 



25 



30 



about 10 and 500 ug and even more preferably between about 10 and 300 jig. 
Furthermore, the injections are preferably relatively rapid, e.g.. the entire volume is 
injected over a period less than 2 minutes, more preferably less than 1 minutes and 
even more preferably less than 30 seconds. 

2.5.0 Transgenic Animals 

The expression cassettes of the present invention may be introduced into the 
genome of an animal in order to produce transgenic animals for purposes of practicin.^ 
the methods of the present invention. In n preferred embodiment of the present 
invention, the transgenic animal is a transgenic rodent, for example, a mouse, rat or 
guinea pig. When a light-generating protein is used as a reporter, imaging is typically 
carried out using an intact, living, non-human transgenic animal, for e.xample, a 
living, transgenic rodent (e.g.. a mouse or rat). A variety of transformation techniques 
are well known in the art. Those methods include the following. 

(1) Direct microinjection into nuclei: Expression cassettes can be microinjected 
directly into animal ceU nuclei using n^cropipettes to mechanically transfer the 
recombinant DNA. This method has the advantage of not exposing the DNA to 
cellular compartments other than the nucleus and of yielding stable recombinants at 
high frequency. See, Capecchi. M., Ceil 22:479-488 (1980). 

For example, the expression cassettes of the present invention may be 
microinjected into the early male pronucleus of a zygote as early as possible after the 
formation of the male pronucleus membrane, and prior to its being processed by the 
zygote female pronucleus. Thus, microinjection according to this method should be 
undertaken when the male and female pronuclei .^re well separated and both are 
located close to the cell membrane. See, e.g.. U.S. Patent No. 4,873,191 to Wa-^ner 
et al. (issued October 10. 19S9); and Richa, J.. (2001) "Production of Transgenil 
Mice," Molecular Biotechnology, March 2001 vol. 17:261-8. 

(ii) ES Cell Transfection: The DNA containing the expression cassettes of the 
present invention can also be introduced into embryonic stem ("ES") ceils. ES cell 
clones which undergo homologous recombination with a targetmg vector ai-e 
Identified, and ES cell-mouse chimeras arc then produced. Homozygous animals are 
produced by mating of hennzygous chimera animals. Procedures ai-e described 
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e.g.. Roller. B.H. and Smithies. O., (1992) "Altering genes in animals by gene 
targeting", Annual review of immunology 10:705-30. 

(iii) Electroporation: The DNA containing the expression cassettes of the 
present invention can also be introduced into the animal cells by electroporation. In 

5 this technique, animal cells are electroporated in the presence of DNA containing the 
expression cassette. Electrical impulses of high field strength reversibly permeabilize 
biomembranes allowing the introduction of the DNA. The pores created during 
electroporation permit the uptake of macromolecules such as DNA. Procedures are 
described in, e.g.. Potter. H., et al., Proc. Nat'l. Acad. Sci. U.S.A. 81:7161-7165 
10 (1984); and'Sambrook. ch. 16. 

(iv) Calcium phosphate precipitation: The expression cassettes may also be 

. transferred into cells by other methods of direct uptake, for example, using calcium 
phosphate. See, e.g., Graham, R, and A. Van der Eb, Virology 52:456-467 (1973); 
and Sambrook, ch.l6. 

(v) Liposomes: Encapsulation of DNA within artificial membrane vesicles 
(liposomes) followed by fusion of the liposomes with the target cell membrane can 
also be used to introduce DNA into animal cells. See Mannino, R. and S. Gould- 
Fogerite, BioTechniques, 6:682 (1988). 

(vi) Vii-al capsids: Viruses and empty vkal capsids can also be used to 
incorporate DNA and transfer the DNA to animal cells. For example. DNA can be 
incorporated into empty polyoma viral capsids and then delivered to polyoma- 
susceptible cells. See, e.g., Slilaty, S. and H. Aposhian, Science 220:725 (1983). 

(vii) Transfectioii using polybrene or DEAE-dextran: These techniques are 
described in Sambrook, ch. 16. 

(viii) Protoplast fusion: Protoplast fusion typically involves the msion of 
bacterial protoplasts carrying high numbers of a plasmid of interest wkh cultured 
animal cells, usually mediated by treatment with polyethylene glycol 
Rassoulzadegan, M., et al.. Nature, 295:257 (1982). 

(ix) Ballistic penetration: Another method of introduction of nucleic acid 
segments is high velocity ballistic penetration by small particles with, ihe nucleic adid 
either within the matrix of small beads or particles, or on the surface. Klein, et al, 
Nature, 327, 70-73, 19S7. 
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Any technique that can be used to introduce DNA into the animal cells of 
choice can be employed (e.g.. 'Transgenic Animal Technology: A Laboratory 
Handbook," by Carl A. Pinkert, (Editor) First Edition, Academic Press; ISBN: 
0125571658; "Manipulating the Mouse Embryo : A Laboratory Manual," Brigid 
Hogan, et al.. ISBN: 0879693843, Publisher: Cold Spring Harbor Laboratory Press. 
Pub. Date: September 1999, Second Edition.). Electroporation has the advantage of 
ease and has been found to be broadly applicable, but a substantial fraction of the 
targeted cells may be killed during electroporation. Therefore, for sensitive cells or 
cells which are only obtainable in small numbers, microinjection directly into nuclei . 
may be preferable. Also, where a high efficiency of DNA incorporation is especially 
important, such as transformation without the use of a selectable marker (as discussed 
above), direct microinjection into nuclei is an advantageous method because typically 
5-25% of targeted cells will have stably incorporated the microinjected DNA. 
Retrovii-al vectors are also highly efficient but in some cases they are subject to other 
shortcomings, as described by Ellis, J., and A. Bernstein, Molec. Cell. Biol. 9:1621- 
1627 (1989). Where lower efficiency techniques are used, such as electroporation, 
calcium phosphate precipitation or liposome fiision, it is preferable to have a 
selectable marker in the expression cassette so that stable transformants can be readily 
selected, as discussed above. 

20 In some situations, introduction of the heterologous DNA will itself result in a 

selectable phenotype, in which case the targeted cells can be screened directly for 
homologous recombination. For example, disrupting the gene hart results in resistance 
to 6-thioguanine. In many cases, however, the transformation will not result in such 
an easily selectable phenotype and, if a low efficiency transformation technique such 
25 as calcium phosphate precipitation is being used, it is preferable to include in the 
expression cassette a selectable marker such that the stable integration of the 
expression cassette in the genome will lead to a selectable phenotype. For example, if 
the introduced DNA contains a neo gene, tlien selection for integrants can be achieved 
by selecting cells able to grow on G418. 

Transgenic animals prepared as above are useful for practicing the methods of 
the present invention. Operably linking a promoter of interest to a reponer sequence 
enables persons of skill in the art to monitor a wide variety of biological processes 
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involving expression of the gene from which the promoter is derived. The transgenic 
animals of the present invention that comprise the expression cassettes of the present 
invention provide a means for slciUed artisans to observe those processes as they occur 
in vivo, as well as to elucidate the mechanisms underlying those processes. 
5 With respect to transgenic animals carrying expression cassettes that employ a 

light-generating protein as a reporter sequence, the monitoring of expression of 
luciferase reporter expression cassettes using non-invasive whole animal imaging has 
been described (Contag, C. et al, U.S. Patent Nos. 5,650,135, and 6,217,847, issued 
22 July 1997, and April 17, 2001. respectively; Contag, P.. et al. Nature Medicine 
10 4(2):245-24^, 1998; Contag, C, et al. OSA TOPS on Biomedical Optical 

Spectroscopy and Diagnostics 3:220-224. 1996; Contag, C.H., et al. Photochemistry 
and Photobiology 66(4):523-531, 1997; Contag, C.H.. et al, Molecular Microbiology 
18(4):593-603, 1995). Such imaging typically uses at least one photo detector device 
element, for example, a charge-coupled device (CCD) camera. 

Thus, in one exemplary embodiment, transgenic mice carrying expression 
cassettes comprising control elements derived from Cyp3All or CYP3A4 operably 
linked to a luciferase-encoding reporter sequence may be used to monitor Cyp3Al 1- 
or CYP3A4-mediated drug metabolism. The transgenic animals of the present 
invention that comprise the expression cassettes of the present invention also provide 
a means for screening analytes that may be capable of modulating such toxicity and 
metabolic processes and thereby identifying and characterizing compounds for safety, 
possible toxicity and pharmaceutical applications. 

Methods of administration of the analyte include, but are not limited to, 
injection (subcutaneously, epidermally, intradermally), intramucosal (such as nasal, 
rectal and vaginal), intraperitoneal, intravenous, oral or intramuscular. Other modes 
of administration include oral and pulmonary administration, suppositories, and 
transdermal applications. Dosage treatment may be a single dose schedule or a 
multiple dose schedule. For example, the analyte of interest can be administered over 
a range of concentration to detenninc a dose/response curve. The analyte may be 
administered to a series of test annuals or to a single test animal (given that response 
to the analyte can be cleared from the tran.sgenic animal). 

Thus, in one exemplary embodiment, transgenic mice carrying expression 
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cassettes comprising the Cyp3Al 1 or CYP3 A4 promoter operably linked to a 
luciferase-encoding reporter sequence may be used to monitor the effects of a 
candidate compound on Cyp3Al 1 or CYP3A4 expression. The results of those 
experiments demonstrate that the transgenic mice of the present invention may be 
used to screen compounds which may be effective pharmaceutical agents. 

Tiie creation and phenotypic characterization of transgenic animals comprising 
a CYP3A4 transgene is described in Example 6 (see also Figure 22). The 
characterization methods described in Example 6 may also be applied to the 
characterization of transgenic animals comprising Cyp3Al 1 transgenes. 

Critdria for selecting a transgenic animal, e.g., rodent, useful in a model for 
screening compounds affecting the expression of. for example, the human CYP3A4 
gene are generally as follows: 

Criterion 1. An increase in reporter gene expression, e.g.. luciferase gene 
expression measured by output of light from the liver region, in response to treatment 
with dex or rif. High induction in liver (preferably greater than or equal to 10-fold 
induction over basal levels) by dexamethasone (e.g., administered at 50 mg/Jcg body 
weight) and/or induction in liver (preferably greater than or equal to two-fold 
induction over basal levels) by rifampicin (e.g., administered at 50 mg/kg body 
weight). 

Criterion 2. Greater induction in the liver region relative to other body regions 
of the whole animal. 

Criterion 3. Basal expression seen in the liver region is greater than or equal to 
basal expression in other regions of the animal's body. A lower level of intestinal 
expression, both basal and induced, relative to expression in liver is preferred.^ 
25 Criterion 4 (may optionally be applied). An increase in reporter gene 

expression, e.g., luciferase as reporter and expression measured by output of light 
from the liver region, in response to treatment with at least one of compound selected 
from the following group: phenobarbitol (Phenob), nifedipine (Nif), 5-pregnene-3b- 
OL-20-ONB-16a-Carbonitrile (PCN). and clotrimazole (Clotrim). Additionally 
30 pregnenolone (Preg) may be employed. 

It has been reported that hPXR is the xenobiotic receptor mcdiaUiig CYP3A4 
induction in cell cultures. The seven compounds described above have been shown to 
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activate hPXR in ceil culture to various degrees (Xie W et al. 2000; Genes Dev. 2000 
Dec l;14(23):3014-23; Goodwin et al 1999; Mol Pharmacol. 1999 Dec;56(6):1329- 
39). It appears that Rif. Clotrim, Nif and Phenob were relatively better inducers of 
CYP3A4 expression, while Dex. PCN and Preg were weaker inducers. However, a 
5 human hepatocyte study suggested Dex was a good CYP3A4 inducer (Ledirac et al. 
2000; Drug Metab Dispos. 2000 Dec;28(12): 1391-3.). The results presented herein 
show that Clotrim, Dex, PCN, and Nif induced the transgene CYP3A4-luc better than 
other three compounds. However, all drugs except for Preg induced the transgene to 
various degrees, these data support that this animal model is useful for screening 
10 CYP3A4 inducers. 

Experiments performed in support of the present invention indicate that the 
presence of a functional hPXR gene product is not essential to the use of a CYP3A4 
(or Cyp3Al 1) transgene reporter in a transgenic, non-human, animal or in liver-push 
experiments (see, e.g., Examples 2 and 6). 

Cytochrome P450 CYP3A4 is an important human gene that codes for an 
enzyme expressed in liver. The CYP3 A4 gene product is believed to be pivotal to the 
metabolism of many exogenous chemicals (xenobiotics), including, but not limited to, 
therapeutic drugs, as well as endogenous substances such as steroid hormones. 
Changes in the level of expression of the CYP3A4 gene can dramatically affect a 
drug's elimination and, as such, have a large impact on the drug's effectiveness. 

2.6.0 Monitoring Promoter Activity 

Activity of tlie transcription control element sequences comprising the 
expression cassettes and vectors of the present invention may be monitored by . 
detecting and/or quantifying the protein products encoded by the reporter sequences 
25 operably linked to those promoters. The particular method used to monitor promoter 
activity depends on the reporter sequence employed, and may include, for example, 
enzymatic assay methods, as well as, in the case of reporter sequences which encode 
light-generating proteins, in vitro or in vivo imaging. 

For example, promoter activity in liver push or transgenic animals carrying the 
expression cassettes of the present invention may be monitored using in vivo 
biolum.inescence imaging (see Contag et a]., (see, for example, Contag, P. R., et al., 
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(1998) Nature Med. 4:245-7; Contag. C. H., et al., (1997) Photochcm Photobiol. 
66:523-31; Contag, C. H., et al., (1995) Mol Microbiol. 18:593-603). 

Monitoring promoter activity in turn enables one to monitor the biological 
processes with which that promoter is associated. It may further be employed in 
5 methods of screening analytes which modulate those processes at the promoter level 
(see discussion in the following section). 

Thus, in one aspect of the invention, liver push or transgenic animals carrying 
expression cassettes comprising promoter sequences derived from P450-related gene 
loci such as those described above may be used to monitor drug metabolism and 
10 possible toxicity. 

The effect of drugs on Cyp3All or CYP3A4 expression is still another 
embodiment of this aspect of the invention by in vivo imaging of Cyp3 Al 1 or 
CYP3 A4 promoter Uver push or transgenic mice. The results of the liver push 
experiments demonstrate that animals carrying the expression constructs of the 
present invention may be used to investigate the possible toxicity and metabolism of 
drug by theii- effect on the Cyp3 Al 1 or CYP3A4 promoter-mediated gene expression 
during the process of drug metabolism. 

2.7.0 Screening Analytes 

The methods of monitoring promoter activity discussed above may be 
employed for the purpose of screening analytes (e.g.. candidate drugs) which 
modulate a variety of biological processes, the toxic and metabolic effects of which 
can be evaluated by determining expression at the promoter level Screening may be 
accomplished by means of m vitro assays employing transiently or stably transfected 
cells, and may also be conducted using the liver push and/or transgenic animals of the 
present invention discussed above, either by themselves or in conjunction with other 
wild-type or transformed cells or tissues that have been introduced into those animals. 
The particular assay method used to measure the effects of various candidate 
compounds on promoter activity will be determined by the particular reporter 
sequence present in the expression cassette carried by the cells or animals employed. 
As discussed above, promoter activity in hver push or transgenic animals carrying 
constructs employing reporter sequences encoding light-generating proteins may be 
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measured by means oUx vivo assay methods or by means of the in vivo imaging 
technique reference previously (employing, for example, a bioluminescent or 
fluorescent protein reporter). 

Thus, one aspect of this invention is the use of the expression cassettes and 
vectors for use in screening for toxicity (via induction of P450 promoter activity) or 
pharmacologically active agents (or compounds) that modulate expression of P450 
(e.g., Cyp3Al 1, CYP3A4, etc.) promoter activity, either by affecting signal 
transduction pathways that necessarily precede transcription or by directly affecting 
transcription of the P450 gene. 

For screening purposes, appropriate host cells, preferably liver cells for 
monitoring Cyp3AIl promoter-mediated expression, are transformed with an 
expression vectors comprising a reporter gene (e.g., luciferase) operably linked to the 
P450 (e.g., either Cyp3All or CYP3A4) gene promoters of this invention. The 
transformed cells are next exposed to various test substances and then analyzed for 
15 expression of the reporter gene. The expression exhibited by these cells can be 
compared to expression from cells that were not exposed to the test substance. A 
compound that modulates the promoter activity of the P450 promoter will result in 
modulated reporter gene expression relative to the control. See, e.g. Examples, below. 

Thus, one aspect of the invention is to screen for test compounds that regulate 
(i.e., stimulate or inhibit) gene expression levels mediated by the P450 (e.g., Cyp3A)- 
bcus derived transcription control elements (e.g., promoters). Screening may be 
accomplished by, for example, (i) contacting host cells in which the P450 promoter 
disclosed herein is operably linked to a reporter gene with a test medium containing 
the test compound under conditions which allow for expression of the reporter gene; 

(ii) measuring the expression of the reporter gene in the presence of the test medium; 

(iii) contacting the host cells with a control medium which does not contain the test 
compound but is otherwise essentially identical to tlie test medium in (i), under 
conditions essentially identical to those used in (i); (iv) measuring the expression of 
reporter gene in the presence of the control medium; and (v) relating the difference in 
expression between (ii) and (iv) to the ability of the test compound to affect the " 
activity of the promoter. 

Alternatively, the transformed cells may be induced with a transcriptional 
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inducer, such as IL-l orTNF-alpha. forskolin. dibutyryl-cAiMP, or a phorbol-type 
tumor promoter, e.g.. PMA. Transcriptional activity is measured in the presence or 
absence of a pharmacologic agent of known activity (e.g., a standard compound) or 
putative toxicity (e.g., a test compound). A change in the level of expression of the 
reporter gene in the presence of the test compound is compared to that effected by the 
standard compound. In this way, the ability of a test compound to affect P450 (eg., 
Cyp3 Al 1 or CYP3A4) transcription and the relative toxicities of the test and standard 
compounds can be determined. 

Thus in a further aspect, the present invention provides methods of measuring 
the ability o^ a test compound to modulate Cyp3Al 1 or CYP3A4 transcription by: (i) 
contacting a host cell in which the Cyp3 Al 1 or CYP3A4 promoter, disclosed herein, 
is operably linked to a reporter gene with an inducer of the promoter activity under 
conditions which allow for expression of the reporter gene; (ii) measuring the 
expression of the reporter gene in the absence of the test compound; (iii) exposing the 
host cells to the test compound either prior to, simultaneously with, or after 
contacting, the host cells with the inducer; (iv) measuring the expression of the 
reporter gene in the presence of the test compound; and (iv) relating the difference in 
expression between (ii) and (iv) to the ability of the test compound to modulate 
Cyp3All- or ClT3A4-mediated transcription. 

Because different inducers are known to affect different modes of signal 
transduction, it is possible to identify, with greater specificity, compounds that affect 
a particular signal transduction pathway. Further, Cyp3All or CYP3A4 has been 
shown to be upregulated in processes of drug metabolism. Therefore, such assays 
provide a means of identifying the toxicity of compounds t by their effect on 
25 C3'p3All orCYP3A4. 

This invention also provides transgenic animals usehU as models for studying 
other physiological and pathological processes that involve P450 (e.g., Cyp3Al I or 
CYP3A4) gene expression. 

Various forms of the different embodiments of the invention, described herein, 
30 may be combined. •■ 
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Experimental 

Below are examples of specific embodiments for carrying out the present 
invention. The examples are offered for illustrative purposes only, and are not 
intended to limit the scope of the present invention in any way. 

Materuls and Methods 

Unless indicated otherwise, the experiments described herein were performed 
using standard methods. 



A. PGR Amplification 

For PGR amplifications, the reaction mix contained: 5 pi of lOX reaction 
buffer (no MgCl^); 4 pi 25 mM MgClz; 0.4 pi of 25 mM dNTP mix; 0.5 pi of 10 
pmoles/ul forward primer; 0.5 pi of 10 pmoles/pl reverse primer; 1 pi (0.2 pg) of 
DNA (BAG or genomic); 38.35 pi of H^O; and 0.25 pi of Taq Polymerase (Life 
Technologies). The PGR was carried out' as foUows: 3 minutes at 94° C; 30 cycles of 
30 sec at 94°C; 30 seconds at 57°C and 1 min 30 sec at 72°C; 7 minutes at 72''C; and 
stored at4°C. 



20 B. Southern Blotting 

(i) Primers were designed and used to PGR screen a mouse 129/SvJ genomic 
DNA BAG (bacterial artificial chromosome) libraiy (Genome Systems, Inc., St. 
Louis, MO) in order to isolate a Cyp3 Al 1 promoter sequence. 

A library containing, on average, contained inserts of 120 kb with sizes 
ranging between 50 kb to 240 kb was screened. A large genomic DNA fragment that 
contained CypBAl 1 promoter region was obtained. Sinularly, a large DNA fragment 
that contained CYP3A4 promoter was obtained by screening a similarly-sized human 
library. 

The Gyp3Al 1 and CYP3A4 BAG DNA were isolated by GsCJ 
ultracentnfugation and digested with various restriction enzymes for 2 hours. 
Digested DNA fragments were separated on a 1 % agarose gel. Tlie gel was 
depurinated in 250 mM HGL for 10 minutes and then denatured twice in 20X SSG 
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with 0.5M NaOH for 20 minutes. DNA was then transferred onto Hybond N+ 
membrane (Amersham, Piscataway NJ) with 20X SSC for 1-2 hours using a vacuum 
blotting apparatus (Stratagene, La Jolla, CA). After transferring, the membrane was 
cross-linked according to the manufacturer's directions using UV Cross-Linker 
5 (Stratagene, La JoUa, CA) and rinsed with 5X SSC. The membrane was then 

prehybridized at 60°C for 1-6 hours with prehybridization solution (Stratagene, La 
Jolla, CA). 

Probes were prepared by labeling PCR fragments or isolated DNA. For 
example, the 1.6 kb promoter fragment amplified by PCR (as described in, e.g.. 
10 Example 1) W labeled according to the manufacturer's instructions using Gene 
Image Random-Prime Labeling and Detection System (Amersham, Piscataway NJ). 
Denatured probe was added to the prehybridization solution and the membrane 
hybridized overnight at 60 °C. 

After hybridization, the membrane was washed twice with pre-warmed IX 
15 SSC, 0.1% SDS for 20 minutes at 60=C each tune. Subsequently, the membrane was 
washed twice with pre-warmed 0.5X SSC for 20 minutes at 60" C each time. The 
membrane was blocked at RT for 1 hour using blocking solution (Stratagene, La JoUa, 
CA) and incubated with antibody conjugated to alkaline phosphatase for 1 hour. 
After three washed, substrate CDP-Star was added for 5 minutes. The membrane was 
20 exposed to X-ray film for between 1 minute and 3 hours. 

(ii) Primers were designed and used to PCR screen a human genomic DNA 
BAC (bacterial artificial chromosome) library (Genome Systems, Inc., St. Louis, MO) 
in order to isolate CYP3 A4 promoter sequence. The library, on average, contained 
inserts of 120 kb with sizes ranging between 50 kb to 240 kb. A large genomic DNA 
25 fragment that contained CYP3 A4 promoter region was obtained. Southern analysis 
was performed essentially as described above with the exception tl-.a: PCR fragiiient 
probes and isolated DNA probes were CYP3A4-sequence specific. 

C. In Vivo Expression Assays: Liver Push Protocol 

30 In vivo gene expression mediated by Cyp3Al 1 or C^T3A4 regulatory 

sequences were assayed by means of 'liver push" assays. 
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Plasmids administered for liver push experiments were injected intravenously 
according to the tnethod of Liu R, et al, (1999) Human Gene Therapy 10:1735-1737. 
■ For example, 2.2 ml of a PBS solution containing the desired Cyp3All or CYP3A4 
promoter constructs were injected into the tail vein over a period of less than 8 
5 seconds. 

For Cyp3Al 1 and CYP3A4, it was previously believed that a co-receptor is 
necessary for induction by rifampicin. Accordingly, a plasmid expressing hPXR (a 
human rifampicin co-receptor) was optionally co-administered with the Cyp3Al 1-luc 
or CYP3A4-1UC constructs. PXR-expressing plasmids were obtained from Dr. Steven- 
10 Kliewer at Glaxo-Wellcome. 

D. Preparation of Transgenic Animals 

The transgenic animals described below were prepared using the 
microinjection into single cell stage embryos (see, e.g., U.S. Patent No. 4,873,191 to 
Wagner, et al. (issued October 10. 1989); Richa, J., (2001) Molecular biotechnology 
17:261-8). The embryos were implanted into pseudo-pregnant females and the 
offspring screener by PGR using primers lucF I 
(GCCATTCTATCCGCTGGAAGATGG; SEQ ID NO: 11) and lucR4 

(CGAnTTACCAGATTTGTAGAGGrrTTAGTTGG; SEQ ID NO: 16). Imaging of 
20 animals was done as described below. 



15 



E. In Vivo Imaging 

In vivo imaging was performed as described previously (Gontag, et al. ,(see 
e.g., Gontag, P. R., et al., (1998) Nature Med. 4:245-7; Gontag, C. H.. et al, (1997) 

25 PhotochemPhotobioI. 66:523-31; Gontag. G. H., et al., (1995) Mol Microbiol. 
18:593-603); Zhang et al,, (2001) Transgenic Res. 2001 Oct;10(5):423-34) using 
either an intensified CCD camera (ICCD; model C2400-32. Hamamatsu, Japan) fitted 
with a 50 mm f 1.2 NiJckor lens (NiJcon, Japan) and an image processor (Ai-gus 20, 
Plamamatsu), or with a cryogenically cooled camera (Roper Scientific, Trenton. N.T) 

30-- fitted with a 50 mm f 0.95 Navitar lens (Buhl Optical, Pittsburgh, PA) available as an 
integrated imaging system (IVIS™ Imaging System. Xenogen. Corporation, 
Alameda, CA.) conti-olled using Livinglmage® software (Xenogen, Corporation, 
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Alameda, CA). 

The substrate luciferin was injected into the intraperitoneal cavity at a dose of 
150 mg/kg body weight (30 mg/ml Luciferin stock) approximately five minutes prior 
to imaging. Mice were anesthetized with either Nembutal (25-50 mg/kg body weight) 
. 5 or in a gas chamber with an isoflurane/oxygen mixture and isoflurane tubing was 

placed on the animals' noses, and placed on the imaging stage under anesthesia. Mice 
were typically imaged from the ventral side for 1 minute. Relative photon emission 
over the liver region was quantified using Livinglmage® software (Xenogen, 

Alameda, CA). 

i 

10 These imaging method can be used to track events in a test subject over time. 

For example, a compound may be administered to a subject (comprising a light- 
generating reporter), and photon emission from the subject after administration of the 
compound may be measured. Such measuring may be repeated at selected time 
intervals which is typically effective to track an effect of the compound on a level of 

15 reporter expression in the subject over time. 

F. Western Blot Analysis 

Following final imaging, animals were sacrificed, and then- livers excised and 
immediately frozen in liquid nitrogen. The liver tissue from each animal was then 

20 homogenized separately in 4 volumes of PBS buffer using a Sonic Dismembrator 
(Fisher Scientific, PA). The protein concentration of each of the homogenates was 
measured using the Bradford Reagent (Sigma Chemical Co., St. Louis, MO) 
according to the manufacturer's recommendations. Proteins in the tissue 
homogenates were separated by size on a denaturing 10% polyoicrylaniide gel 

25 according to the method of Laemmli, U.K. (1970) and then transferred to a 
nitiocclluiose membrane vBioRad, Emeryville, CA). 

CypSAl 1 protein was detected using primary Goat-anti-Rat Cyp3A2 antibody 
(GenTest, Woburn, MA, 01801). The secondary antibody was anti-goat-IgO- 
peroxidase conjugated antibody (Sigma, St. Louis, MO). 

30 
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Example 1 
Vector Construction 

A. Mouse C}'p3All 

Cyp3Al 1 and other expression constructs were constructed as described 

5 below. 

pBSSK-Cyp3All.S : Figure 5 shows a schematic of the construct designated 
pBSSK-Cyp3Al IS. Briefly, the construct was made as foUows. A 1.6 kb fragment 
of the Cyp3 Al 1 promoter (extending from -1.6 kb to +65 bp) was PGR amplified 
fi-om mouse genomic DNA using Cyp3Al lTopEcoRI.primer 
10 (GTTGAATfcc AGCTAATGAGGGC AAAGTTCTCAG, SEQ ID NO: 1) and 

Cyp3AIlBot XhoLprimer (ATCCTCGAGCTTCTCTGTGTTCTCCCTACAACTG, 
SEQ ID N0:2). (See, also, Toide et al. (1997) Arch. Biochem. and Biophysics 
338:43-49). pBIueScriptSK (Stratagene. La JoUa, CA) was digested with EcoRI and 
Xhol and the 1 .6 kb Cyp3 Al 1 promoter fragment ligated into the vector. 

pBSSK-Cvp3AnM : This construct contains a 6 kb fragment of the mouse 
Cyp3Al 1 promoter and was constructed as follows. Primers designated 
Cyp3AllFl.primer (GGTATGTGGTGCTTGTGTATGCATAC, SEQ ID N0:3) and 
Cyp3Al lR2.primer (CAGATAGGATTGAGTGAGCCAGAGG, SEQ ID N0:4) 
were used to screen BAG clones of mouse genomic DNA (see. also, Materials and 
Methods above). One positive BAG clone was selected and analyzed by restriction 
digests and Southern blotting, as described above, using the 1.6 kb promoter fragment 
as the probe. 

Based on these Southern blots of the selected BAG clone, the Psfl fragment of 
the BAG clone was isolated, subcloned into a sequencing vector and sequenced. 
Subsequently, the PstI subclone was digested with Smal/PstI and the resulting 5.9 kb 
Xmal/PstI fragment isolated. pBSSK-Cyp3AllS was digested with Xmal/PstI and 
the backbone (including the downstream promoter region extending from the 3' PstI 
site to the Xhol site was isolated. The 5.9 Xmal/PstI fragment was cloned into 
Xmal/Pstl-digested pBSSK-Gyp3Al IS backbone. Thus, after ligation, the resulring 
pBSSK-Cyp3 Al lA'I (Figure 6) contained the 5.9 kb Smal/Pst BAG fragment and, 
additional downstream promoter sequences extending from the 3' Psd to the Xhol site 
of pBSS.K-Cyp3AllS. 
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pGL3-I-Basjc : A 208 base pair intron fragment was amplified frompCAT- 
Basic (Promega, Madison, WI) using Intron Top Bgin.primer 
(TCGAGATCTTGCGGCCGCTTAACTGCAGAAGTTG. SEQ ID N0:5) and Intron 
Bet Hindin.Primer (GCCAAGCTTGCGGCCGCTTAAGAGCTG, SEQ ID N0:6). 
A yellow-green luciferase with an emission peak of about 540 nm is commercially 
available in a plasmid vector from Promega. Madison, WI under the name pGL3 and 
this plasmid was digested with BgUI and HindlH. The PGR intron fragment was then 
ligated into the Bgni-Hindlll cut pGL3 vector. The resulting vector, designated 
pGL3-I-Basic is shov/n in Figure 7. 

pGLil-Cvp3AllS : The 1.6 kb Cyp3Al IS promoter was amplified from the 
Cyp3Al 1-BAC clone using Cyp3Al ITopKpnI.prinier 

(GTTGGTACCCAGCTAATGAGGGCAAAGTTCTCAG, SEQ ID N0:7) and 
Cyp3Al IBot Hindm.primer 

(ATCAAGCTTCTTCTCTGTGrrCTCCCTACAACTG, SEQ ID N0:8). The PGR 
product was cloned in to the pGL3-I-Basic vector which had been digested with Kpnl 
and Hindin. The resulting construct. pGL3-I-Cyp3Al IS, is shown in Figure 2. 

pGL3-I-Cvp3A11M : pBSSK-Cyp3Al IM was digested with Xmal and Xhol 
and the resulting 6 kb fragment ligated into pGL3-I-Basic which had been previously 
digested with Xraal and Xhol. The resulting construct is shown in Figure 3. 

pGL3-I-CVp3Alll, : As described above. Southern blotting was carried out on 
the selected BAG clone using the 1.6 kb promoter fragment as a probe. In addition to 
the 6 kb PstI fragment used to generate pBSSK-CypSAllM. a 10 kb Kpnl fragment 
was also identified. The 10 kb Kpnl fragment was sub-cloned, sequenced and 
restriction sites identified from the sequence. It was determined that ihe 10 kb 
fragment overlapped the 5.9 pBSSK-Cyp3AllM fragment and also contained 
additional upstream sequence. The overlapping portion included an .Xhel site (Figure 
6). Both pGL3-I-3Al IM and the 10 kb Kpnl subclone were digested with Nhel and 
Kpnl. This resulted in a pGL3-I-Cyp3AllM backbone that included the tipstreara 
region of the promoter extending from Nhel to Xhol in the vector backbone. After 
ligating the Nhel/Kpnl fragment of the 1 0 kp Kpnl fragment into the pGL3-I - " 
Cyp3Al IM backbone, the resulting pGL3-I-Cyp3AllL contained approximately 9 kb 
of promoter sequences, including additional downstream sequence extending from the 



wo 02/(l8S305 



PCT/USn2/ll770 



10 



15 



internal Nhel to the flanking Xhol site of pGL3-I-Cyp3AllM. The resulting 
constnict is shown in Fisure 4. 

pGL3-I-Cvp3AnyT, : In addition to 6Kb PstI and 10Kb Kpnl subclones, an 
1 1Kb Xmal fragment was also subcloned from 3A1 1 BAG clone. Sequence data 
showed this 1 1Kb Xraal firagment overlapped with the 10Kb Kpnl fragment. A 2. 1 
Kb Kpnl fragment was cloned from the 1 1Kb subclone in the Kpnl site of pGL3-I- 
Cyp3Al IL. The orientation of the 2. 1Kb Kpnl fragment was confirmed by DNA 
sequencing. The resulting new construct was designated pGL3-I-Cyp3 Al IXL 
containing 1 1 Kb Cyp3Al 1 promoter region. 

Experiments performed in support of the present invention have provided 
approximately 9,330 base pairs (Figure IB; SEQ ID N0:13) of sequence derived from 
the Cyp3Al 1 gene locus, upstream of the protein coding region. The sequence of the 
promoter region comprising transcription control elements is presented in Figure 1 A 
(SEQ ID NO: 12). The figure includes genomic sequences including the initiation 
ATG codon. Table 1 indicates the sequences fi-orathe Cyp3All gene locus, upstream 
of the protein coding region, which comprise the above described constructs. The 
stmting and ending positions in Table 1 are given relative to the sequence presented in 
Figure lA. 
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Table 1 



Vector Name 


Approximate 

Size of 
Fragment from 
the C}'p3All 
gene locus 


Starting Position 
of Cyp3All gene 
locus fragment 
relative to Figure 
lA 


Ending Position 
of Cyp3All gene 
locus fragment 
relative to Figure 
lA 


pGL3-l-Cyp3AllS 


1.6 kb 


9,334 


10,978 


pGL3-I-Cyp3AllM 


6.0 kb 


5,024 


10,978 


pGL3-I-Cyp3AnL 


8.9 kb 


2.096 


I0,97S 


j)GL3-I-Cvp3AllXL 


11 kb 


1 


10,978 



B. Human CYP3A4 

C^a^3A4 and other expression constructs were constructed as described 



below. 



25 pGLSiCYKML: a 1 3Kb human C YP3 A4 promoter 



was constructed froiti a human 



58 



wo ((2/088305 



PCT/US02/11770 



10 



15 



20 



BAG clone (Screened by Incyte Genomics, Inc. using primers 3A4Top 
(GTTGGTAGGCTGCAGTGAGCAGTGCCCCATGATTG. SEQ ID N0:9) and 
3A4Bot Primer (ATCAAGCTTCCTTTCAGCTCTGTGTTGCTCnTGC. SEQ ID 
NO:10). Goodwin, et al., (WO 9961622 Al, published 2 December 1999) published a 
632 bp CYP3A4 promoter sequence from approximately 6 kb to approximately 5.4 kb 
of the 13 kb proin9ter sequence. Upstream (5') DNA sequences were obtained for the 
region corresponding to approximately -13Kb to -10.5Kb (Kpnl site to BamHI site). 
Putative Hepatocyte nuclear factor 3b (HNF-3b) binding sites were identified in this 
region. The resulting construct is shown in Figure 9. 

PGLS-I-CYP3A4M : A 12.5 kb BamHI fragment from pGL3-I-CYP3A4L 
containing 10.5 kb promoter region was cloned into pBSSK (Stratagene, La Jolla, 

OA). The resulting plasmid was called PGL3-I-CYP3A4M. The resulting construct 
is shown in Figure 8. 

Figure 17A (SEQ ID NO: 14) presents approximately 13 kb of sequence 
derived from the human CYP3 A4 gene locus, upstream of the protein coding region. 
The figure includes genomic sequences including the initiation ATG codon. A 2.5kb 
fragment of the promoter region comprising transcription control elements, identified 
herein, that affect liver-specific basal expression in mouse liver is presented in Figure 
17B (SEQ ID NO: 15). 

Table 2 indicates the sequences from the CYP3A4 gene locus, upstream of the 
protein coding region, which comprise the above described constructs. The starting 
and ending positions m Table 2 are given relative to the sequence presented in Figm-e 
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Table 2 



Vector Name 


Approximate 

Size of 
Fragment from 
the CYP3A4 
gene locus 


Starting Position of 
CYP3A4 gene 
locus fragment 
relative to Figure 
17A 


Ending Position 
of CYP3A4 gene 
locus fragment 
relative to Figure 
17 A 


pGL3-I-CYP3A4M 


10.5 kb 


2,461 


12.998 


pGL3-I- CYP3A4.L | 


13 kb 


1 


12,998 
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Example 2 

Liver Push Assays 
Liver push assays were conducted as described above. 
A. Mouse Cvd3 All Constrnr.f; 
5 Brietly, expression constructs described in Example 1 A were intravenously 

injected into mice and imaged as described above two weeks later (pre-treatment 
group). For each expression construct, 3 mice were then treated with 100 |Jl of 
DMSO (solvent control); 3 mice were treated with 0.1 mg/g dexamethasone (100 pi); 
and 3 mice were treated with 0.1 mg/g of Rifampicin (100 [il). Dexamethasone and 
10 rifampicin Hkve been previously shown to induce Cyp3Al 1 expression Yanagimoto et 
al. (1997) Arch Biochem Biophys 340(2):215-8. In additional, mice were treated with 
a single dose of DMSO. rifampicin (Rif) and dexamethasone (Dex), at the same 
dosages. The mice were then imaged a vai-ious time points after drug administration. 
As shown in Figure 10, imaging mice subjected to liver push with 5 jug of 
1 5 Cyp3Al IM and 0 pg of hPXR showed nearly 5 fold induction of luciferase activity 
approximately 48 hours after dexamethasone administration. DMSO-treated mice 
showed little or no induction, while rifampicin-treated mice showed some induction in 
luciferase approximately 48 to 96 hours after drag treatment. 

As shown in Figure 12, imaging mice subjected to liver push with 5 \xg of 
20 Cyp3 Al 1 L and 1" ,ug of hPXR showed nearly 8 fold induction of luciferase activity 
approximately 6 hours after rifampicin administration. DiVISO-treated mice showed 
little or no induction, while dexamethasone-treated mice showed approximately 4 fold 
induction in luciferase approximately 24 to 48 hours after drug treatment. 

Thus, modulation of e.xpression mediated by Cyp3All transcriptional control 
elements can be directly monitored in live animals to provide information on toxicity 
of acomoound. 
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B. Titration of hPXT? 

It was previously believed that rifampicin uptake requii'ed co-nclministration of 
30 a rifampicin co-receptor such as hPXR. In particular, hPXR has been .Miown to 
mediate induction of CYP3A4 expression m human hepatocytes by mt drugs 
dexamethasone and rifampicin, see Pascussi JM. et al, Mol Pharmacol (2000 Aug) 
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58(2):361-72. 

To test thi^ notion, we conducted liver push experiments with varying amounts 
of hPXR and the cyp-cxpression constnicts described in Example 1. Asihown in 
Figures 11, 13 and 16, PXR is not required for rifampicin uptake or induction of 
luciferase activity mediated by CYP3A4L or Cyp3Al IM. Indeed, m certain cases, 
induction of luciferase expression actually decreased when higher dosages of PXR 
were used. 

Thus, administration of a rifampicin co-receptor is not required for ripampicin 

uptake. 
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C. Human CYP3A4 Constructs 

Briefly, expression constructs described in Example IB were intravenously 
injected into mice and imaged as described above two weeks later (pre-treatment 
gi-oup). For each expression construct, a 2.2 ml volume of plasmid mixture .(pGL3-I- 

1 5 CYP3 A4M or pGL3-I-CYP3 A4L) was intravenously injected into a 22 gram FVB 
female mouse over a period of less than s' seconds. For imaging, the substrate 
luciferin was injected into the peritoneal cavity at a dose of 150 mg/kg body weight 
(15 mg/ml luciferin stock). Mice were then anesthetized in a gas chamber with 
isoflurane/oxygen. 5 minutes after luciferin injection, anesthetized mice with 

20 isoflurane tubing on noses were placed on the imaging stage and imaged from the 
ventral side for 1 minutes using Xenogen IVIS imaging system. Relative photon 
emission over the liver region was quantified using Livinglmage software. 

As shown in Figure 14, mice subjected to liver push with 5 ug of pGL3-I- 
CYP3A4L and 0 of hPXR showed approximately 15-fold induction of luciferase 

25 activity approximately 6 hours after Rif administration; approximately 13-fold 

induction 6 hours after dexamethasone treatment and little or no mducuon at any time 
following DMSO administration. As shown in Figure 15 (data shov. n for second dose; 
twelve days after first dosing, mice were treated with a second dose and imaged), 
mice subjected to liver push with 5 pg of pGL3-I-CyP3A4L and 1 ug of hPXR 

30 showed approximately 40-foid induction of luciferase activity appioxiniately 12 hours 
after Rif administration; approximately 60-fold induction 12 hours arTer 
dexamethasone treatment and little or no induction at any time following DMSO 
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administration. 

Furthermore, the 13Kb promoter (pGL3-I-CYP3A4L) mediated much higher 
expression in the livers than the 10.5 Kb promoter (pGL3-I-CYP3A4iM) by 
approximately 25 fold. 

5 

Example 3 

High Through-Put Toxicity Screening Via the Cyp3A11/CYP3A4 
Promoter Sequence 
Compounds can be screened for safety and/or possible toxicity by monitoring 
10 their ability fo modulate Cyp3Al 1 or CYP3 A4 promoter-mediated bioluminescence 
in transfected cells. Host cells (e.g., liver cells) are transfected, for example using 
lipofectamine (Promega, Madison, WI) with Cyp3 Al 1-Luc or CYP3A4-Luc 
constructs and are plated into 96 well plates and used for high-through-put screening 
of a compound library. Transfections are carried out according to the manufacturer's 
15 instructions or standard protocols. After transfection, the cells are treated with 

selected compounds for approximately 36 hours and, subsequently, the cells are lysed 
with passive lysis buffer (Promega) and assayed with tKe Dual-Luciferase Reporter 
Assay System (Promega) for luciferase activity. 

Of) 

Example 4 

/,v VIVO Monitoring of CypSAII- or CYP3A4-Mediated Metabolism 
A. Mouse Cvd3A1 1 

Transgenic mice carrying the Cyp3All promoter-LucYG- expression cassette 
are obtained with known methods for generating transgenic mice (see the discussion 
above). These animals ("founders") were bred to non-transgenic mates to produce 
litters CTl animals"). 

Fl anLmals from the founders are imaged from the age of one to six weeks 
according to the methods described above. The observed signal intensities were 
quantified. 

These experiments demonstrate that the expression cassettes and transgenic'' 
annuals of the present invention may be used to monitor Cyp3Al 1 promoter-mediated 
expression of bioluminescence in vivo. 

62 



25 



30 



wo 02/088305 



PCT/US02/U770 



10 



B. Human CYP1A4 

Transgenic mice were generated using the pGL3-I-CYP3A4M-luc or pGL3-I- 
CYP3A4L constructs. For pGL3-I-CYP3A4M, the plasmid was digested with BamHI 
and the 12.5 kb fragment containing the 10.5Kb CYP3A4 promoter, a chimeric intron, 
and firefly iuciferase cDNA was purified from agarose gel by electroelution. For 
pGL3-I-CYP3A4L, the plasmid was digested with Pvul and the 15 kb containing 
13Kb CYP3A4 promoter, a chimeric intron, and firefly Iuciferase cDNA was purified 
fi-om Agarose gel by electroelution. 

The purified fragments were then each microinjected into single cell stage 
FVB embryos. The embryos were then implanted into pseudo-pregnant mice. The 
founders were screened by PGR and imaging and the resulting transgenic animals 
were imaged fi-om the age of one to sbc weeks according to the methods described 
above. The observed signal intensities were quantified. Luciferase levels were 
15 highest in the livers of pGL3-I-CYP3A4L animals and in the intestines of pGL3-I- 
CYP3A4M animals. 

Example 5 

Identification Of Repeat Sequences and Promoter Regions 

20 

A. In The Cvp 3 Al 1 Promoter Rp. sion. 

Figure lA (SEQ ID NO: 12) comprises the nucleotide sequence of a 
transcriptional control element from the mouse Cyp3All gene locus. In the figure, 
the sequence represents 12.275 nucleotides in total, the translational start codon 

25 (ATG) is located at positions 1 1,003-1 1,005, a TATA box is located at positions 

10,884 to 10,887, a major transcription start site begins with the C at position 10,914. 
An approximately 9.3 kb region of the Cyp3 Al 1 gene is from nucleotide position 1 to 
9,330 of Figure lA and the approximately 9.3 kb sequence is presented alone in 
Figure IB (SEQ ID NO: 13). The present invention also includes a transcriptional 

30 control element sequence comprising a polynucleotide of nucleotides 1-11,002 of 
SEQ ID NO: 12. 

The approximately 9.3 kb sequence from the mouse Cyp3AIl promoter was 
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used in a BLAST search of GenBank. In one search, a match to a mouse LI element 
(LINE family of repeated sequences)(Locus xMUSLLM9) was identified. There are 
three Icnown families of LI elements in the iVIus genome (Mears, M.L., and 
Hutchison, C.A., J Mol Evol 2001 Jan;52(l):5 1-62). LI elements are believed to be 
5 associated with a retrotransposon subfamily in mice (e.g., Goodier J.L., et al, 
Genome Res 2001 Oct;ll(10):1677-85). Aligning this sequence with the 3A1 1 
sequence identified a region with approximately 91% identity (see Figure 18). 

In addition, the approximately 9.3 kb 3A11 sequence was analyzed using the 
RepeatMasker program (http://ftp.gen ome.washinaton.edu/cgi-bin/ReDeatMa.skei'l 
10 which can identify regions that have high homology to known repeated sequences 
(e.g. LINES, SINES, and LTR elements). Three regions (approximately 1- 623 . 
2503-5103, and 6129-6791) were identified having high homology (91%) to LI 
elements (Figure 18). Another region (approximately 623-2503) was shown to be 
highly homologous to the mouse MaLR family of repeats (Figure IS). The MaLR 
family of repeats is also thought to be associated with a mammalian retrotransposon- 
like super-family (Kelly, R.G., Genomics 1994 Dec;24(3):509-15). 

Two primary non-repeat regions of the approximately 9.3 kb 3A1 1 sequence 
were identified (approximately 5104-6218 and 6792-9330). 

20 E In The CYP3A4 Promoter Rp.aion 

Figure 17A (SEQ ID NO; 14) coBiprises the nucleotide sequence of a 
transcriptional control element from the human CYP3A4 gene locus. In the figure, the 
sequence represents 13,035 nucleotides in total, the translational start codon (ATG) is 
located at positions 13,033 to 13,035, a TATA box is located at positions 12,901 to 
12,904, a major transcription start site begins with the A at position 12,930. An 
approximately 2.5 kb region of the CYP3A4 gene, usefol to facilitate expression as 
described herein, is from nucleotide position 1 to 2,461 of Figure 17A and the 
approximately 2.5 kb sequence is presented alone in Figure 17B (SEQ ID NO; 15). 

Similar analyses to those described above were carried out on the 
approximately 1 3 kb C1T3 A4 promoter region. A summary of the results of this 
analysis are presented in Figure 19. Two different kinds of repeat sequences were 
identified: LI elements and Alu repeats. 
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Five primary non-repeat regions of the approximately 13 kb CYP3A4 
promoter region were identified (approximately 1290-2446, 2758-4111, 4424-6010, 
6317-9099, and 9401-1299S). 

5 Example 6 

Generation of a CYP3A4-Iuc FvB Transgenic Mouse Line 

A. Plasmid construction 

A CYP3A4-1UC reporter was designed essentially as described in Example IB. 
briefly as follows. A BAG clone containing the human CYP3 A4 promoter region was 

10 screened by I^CR using primers (5'-GTTGGTACCCTGCAGTGACCAC- 

TGCCCCATCATTG-3' (SEQ ID N0:9) corresponding to nt ^1 105 to -1080 and 5'- 
ATCAAGCTTCCTTTCAGCTCTGTGTTGCTCTTTGC-3' (SEQ ID NO: 10) 
coiresponding to nt +40 to 69 of CYP3A4 promoter region. The primers were also 
used to amplify a 1.2 kb promoter region of CYP3A4 from human genomic DNA 

15 using pfu DNA polymerase (Stratagene, La Jolla, CA). The PGR product was digested 
with KpnI/Hindlll and purified from agarose gel using Geneclean Kit (Qbiogene, 
Caiisbad, CA). The 1.2 kb promoter region was cloned into pGL3-Basic vector 
(Promega, Madison, WI) containing the modified firefly luciferase cDNA sequences. 
A 233 bp Hindlll fragment containing a chimeric intron from pCAT-3-Basic vector 

20 (Promega, Madison, WI) was then inserted between the CYP3A4 promoter region and 
the luciferase gene. A 1.88 kb KpnL^BglU fragment, a 950 bp Bglll fragment, and a 10 
kb Kpnl fragment subcloned from the BAG clone were inserted sequentially into the 
previous construct. The final construct pGL3-I-GY1^3A4 contains a 13 kb human 
CYP3A4 promoter region, 233bp chimeric intron, and modified firefly luciferase 

25 cDNA. All the joints in the construct were confirmed by DNA sequencing (Stanford 
PAN Facility, Stanford, CA). The entire sequence of GYP3A4-luc transgene is shown 
in Figiu-e 1 7C (SEQ ID NO: 1 7). 

B. Generating CYP3A4-luc Trans geni c fTg) Mice 

30 The transgenic lines were created by the microinjection method (see, e.g., ij.S. 

Patent No. 4,873,191 to Wagncn et al. (issued October 10, 1989); and Richa, I, 

(2001) "Production of transgenic mice" Molecular biotechnology March 2001 vol. 
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17:261-8) using FVB donor embryos. 



Screening Tg Mice 



Eighteen founder mice were screened by PGR using luciferase primers LucFl 
and LucR4 or primers Luc 3 primer (5' - GAAATGTCCGTTCGGTTGGCAGAAGC 
- 3' (SEQ ID NO: 18)) and Luc 4 primer (5' - 

CCAAAACCGTGATGGAATGGAACAACA - 3' (SEQ ID NO: 19)). These same 
primers were also used to screen Tg offspring. 

10 (i) PGR Screening 

Figure 20 presents exemplary results of PGR screening GYP3A4-/£<c Tg mice. 
In the figure: Lane I, DNA ladder; Lane 2, negative littermate; Lane 3, positive 
littermate. The results demonstrate the identification of CYP3A4-luc Tg mice. 

15 (ii) Southern Hybridization Analv.si.'; 

The 1.8 kb Hindlll/Xbal fragment from pGL3-Basic containing the entire 
luciferase cDNA (Promega Corp.) was used as probe for Southern hybridization. Ten 
l-ig of heterozygous genomic DNA was digested with BamHI and 17 pg of pGL3- 
Basic was loaded as a positive control (Figure 21). The expected size of transgene 

20 was 12 kb. The results of an exemplary liybridization analysis are shown in Figure ' 
21. In the figure: Exemplary screening results of FVB/N-TgN(CYP3A4-/Hc) mice by 
Southern hybridization are presented in Figure 21. In the figure: Lane 1, 10 |ag of 
CYP3A4-;uc Tg genomic DNA; Lane 2, 17 pg of pGL3-Basic positive control. These 
results demonstrate the presence of the transgene in the transgenic mice. 



D^ Phenotypin g Data as Applied to Selection Criteria 

General methods for evaluating the animal lines were as follows. Tg founders 

were bred to wild-type FvB niice to generate Fl mice. A female transgenic founder 

was bred to a wild-type FvB male and a male transgenic founder was bred to a few 

wild-type FvB females. 

A Luciferin stock solution of 30mg/nil was prepared in sterile PBS. Luciferin 

was purchased as D-Lucjferin Potassium Salt, as Cat #XR-1001, from Lot # 14021/2 
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from Biosynth AG, Switzerland. 

Dexametliasone (Cat # D1756), rifampicin (Cat # R3501), pregnenolone (Cat 
# P9I29), clotrimazole (Cat # C6019). Nifedipine (Cat #N7634), 5-Pregnen-3b-OL- 
20-ONE-16a-Carbonitrile (Cat # P0534), and phenobarbital (Cat # P3761) were all 
purchased from Sigma (St. Louis, MO). Phenobarbital was prepared in PBS buffer 
and others in DMSO. 

The route of administration for the drugs and for luciferin was IP. The dose of 
reagent administration of luciferin substrate and drugs was as follows. Dose of 
luciferin: 150 mg/kg of a 30mg/ml luciferin stock was injected IP five minutes before 
imaging in ihe IVIS™ (Xenogen Corporation, Alameda, California) system. Dose of 
chemicals: All drugs, with the exception of phenobarbital, were prepared in DMSO 
and were injected IP at a dose of 50 mg/kg (phenobarbital was prepared in PBS at 100 
mg/Kg). DMSO was administrated as a vehicle control. The duration of treatment 
was typically for 2-3 days. 

Following luciferin administration the animals were anesthetized using gas 
anesthesia (Isoflurane) and placed in an IVIStm box (Xenogen Corporation, Alameda, 
California) for imaging. All animals were imaged before and after chemical 
administration, and imaged at high resolution (binning 2) for 10 seconds or 1 minute, 
for males and females, respectively. 

Induction of CYP3A4-1UC by typical CYP3A4 inducers Dexamethasone (50 
mg/kg body weight) and Rifampicin (50 mg/kg body weight) was evaluated in the 
animals. Fl mice from each founder (i.e., mice PCR positive for the presence of the 
transgene) were imaged at T=0 (pretreatment) and at T=3 hours and T=6 hours 
following administration of DMSO, Dexamethasone (Dex), or Rifampicin (Rif). This 
was performed on groups of three mice (including both genders) from nine of IS 
founder lines. Primary screening results from Dex and Rif treatments are described 
below: 

(a) Lines # 75, 195, 230, and 240 showed induction in the intestine 
region, with higher intestinal basal expression than liver basal expression. 

(b) Line #225 showed induction by Dex and Rif in liver and ' 
intestine, but had higher intestinal basal expression than liver basal expression. 

(c) Line #233 showed strong induction in intestine and slight 
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induction in liver region by Dex and Rif. This line had high intestinal basal 
expression. 

(d) Line #22 1 showed no induction by either drug and had very 
low level of basal expression. 
5 (e) Line #82 and 208 showed stronger induction by Dex and Rif in 

liver than in other regions, and basal expression in liver was greater than basal 
expression m intestine; males responded stronger than females. 

The data were evaluated against the selection criteria described earlier in the 
10 specification', (i) induction of gene expression in the liver - Lines #225, #233, #82 
and #208 satisfied this criterion; (ii) greater induction in the liver region than in other 
regions (e.g., intestine) - Lines #82 and #208 satisfied this criterion; and (iii) basal 
expression in liver that is greater than or equal to basal expression in other parts of the 
body (e.g., intestine) - Lines #82 and #208 satisfied this criterion. 
1 5 Mice satisfying the above criteria were typically selected for subsequent 

analysis. Because Lines #82 and #208 shbwed strong induction by Dex and Rif in 
liver and had higher basal luciferase levels in the male liver region, they were 
selected for secondary screening, 

20 Secondary screening: 

Lines #82 and #208 met the first set of criteria, in particular, transgene 
(CYP3A4-/z(c) expression was induced in liver region by Dex as well as Rif, and 
transgene expression in the animals showed higher liver basal expression, at least in 
males. These two lines looked almost identical in primary screening. Line #82 was 
further characterization with seven compounds believed to induce CYP3A4 
expression, includmg pregnenolone (Preg), phenobarbitol (Phenob), rifampicin (Rif), 
nifedipine (Nrf), 5-pregnene-3b-OL-20-ONE-16a-Carbonitrile (PCN), dexamethasone 
(Dex) and clotrimazole (Clotrim). Expression was evaluated in both genders. 
Exemplary results are presented in Figiuc 22, panels A-D, described bslow. Line #82 
responded most of C-iT3A4 inducers in Hver. this line was ciiosen as final line 
designated as FV]3/N-TgN(CYP3A4-//,/c). 
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The results of an exemplary analysis of induction of CYP3 A4-;ac transgene in 
Tg niice are presented in Figure 22, panels A-D. In the figure, mice were imaged at 
T=0 (pretreatment) and T=6 hours following administration of DMSO, pregnenolone 
(Preg), phenobarbitol (Phenob), rifampicin (RiO, nifedipine (Nif), 5-pregnene-3b-OL- 
5 20-OiNE-16a-Carbonitrile (PCN), dexamethasone (Dex) and clotrimazole (Clotrira). 
NT is the non-treated control. Before each imaging session, mice were injected i.p. 
with 150mg/kg luciferin. Panel A presents exemplary induction data for nine male 
mice, each mouse treated with the compound shown in the legend at the bottom of 
Panel C. Panel C presents exemplary induction data for nine female mice, each 
mouse treated with the compound shown in the legend at the bottom of Panel C. 
Panel B presents a bar graph showing similar induction experiments where the results 
are presented for each treatment (shown at the bottom of Panel D) as applied to a 
group of three male mice. Measurements on each mouse were performed as described 
above. Associated error bars are shown. Panel D presents a bar graph showing 
similar induction experiments where the results are presented for each treatment 
(shown at the bottom of Panel D) as applied to a group of thi'ee female mice. 
Measurements on each mouse were performed as described above. Associated error 
bars are shown. 

' The results of this analysis demonstrate that CYP3A4-;ac Tg mice having 
desirable phenotypes, as outlined in the above criteria, can be identified by the 
methods taught herein. 

As is apparent to one of skill in the art, various modification and variations of 
the above embodiments can be made without departing from the spirit and scope of 
this invention. These modifications and variations are within the scope of this 
invention. 
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What is Claimed Is: 

1. A method for identifying an analyte that modulates expression of a reporter 
sequence, wherein expression of said reporter sequence is mediated by transcription 
5 control elements derived from a human CYP3A4 gene, in a transgenic, living rodent, 
said method comprising 

administering said analyte to a transgenic, living rodent, said transgenic rodent 
comprising, an expression cassette comprising a polynucleotide derived from the 
human CYP3 A4 gene, said polynucleotide having at least 95% or greater identity to 
10 nucleotides -1-13.032 of SEQ ID NO: 14, wherein (i) said polynucleotide is operably 
linked to a coding sequence of interest, (ii) the polynucleotide comprises at least one 
transcriptional control element, and (iii) expression of said coding sequence of interest 
is induced in the liver of the living, transgenic rodent by dexamethasone or 
rifampicin; and 

15 monitoring expression of said coding sequence of interest wherein an effect on 

the level of expression of said coding secjuence of interest indicates that the analyte 
affects mediated by transcription control elements derived from the human CYP3A4 
gene. 

2. The method of claim 1 , wherein (iv) basal expression of said coding 
sequence in the liver region of the living, transgenic rodent is greater than or equal to 
that in other regions of the body of the living, transgenic rodent. 

3. The method of any of claims 1-2, wherein said transgenic rodent does not 
15 have sequences encoding a functional hPXR (a human rifampicin co-receptor). 

4. The method of an of claims 1-3, wherein expression of said coding 
sequence of interest is induced in the living, transgenic rodent by at least one 
compound selected from the group consisting of phenobarbitol, nifedipine, 5- 

iO pregnene-3b-OL-20-ONE-l6a-Carbonitrjle and clotrimazole, wherein induction of" 
expression is seen in the liver region of the living, transgenic rodent. 
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5. The method of any of claims 1-4, wherein expression of said coding 
sequence of interest is induced in the Uving, transgenic rodent by dexamethasone 
adniinistered at 50 mg/kg body weight. 

5 6. The method of any of claims 1-4, wherein expression of said coding 

sequence of interest is induced in the living, transgenic rodent by rifampicin 
administered at 50 mg/kg body weight. 

. 7. The method of claim 5, wherein induction of expression of the coding 
10 sequence of interest is greater than or equal to 10-fold induction by dexamethasone 
over basal levels. 

S. The method of claim 6, wherein induction of expression of the coding 
sequence of interest is greater than or equal to two-fold induction by rifampicin over 
15 basallevels. 

9. The method of any of claims 1-8, wherein the coding sequence of interest 
is a reporter sequence. 

20 10. The method of claim 9, wherein the reporter sequence encodes a light- 

generating protein. 

11. The method of claim 10, wherein the light-generating protein is a 
bioluminescent protein or a fluorescent protein. 
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12. The method of claim 11, wherein the bioluminescent protein is luciferase. 

13. The method of claim 12, wherein said expression cassette comprises SEQ 
rDN0:17. 

14. The method of claim 1 ] , wherein the fluorescent protein is selected from 
the group consisting of bins fluorescent protein, cyan fluorescent protein, green 
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fluorescent protein, yellow fluorescent protein, and red fluorescent protein. 

15. The method of any of claims 1-14, wherein said rodent is a mouse or a rat. 

5 16. A transgenic rodent comprising, an expression cassette comprising a 

polynucleotide derived from the human CYP3 A4 gene, said polynucleotide having at 
least 95% or greater identity to nucleotides 1-13,032 of SEQ ID N0:14, wherein (i) 
said polynucleotide is operably linked to a coding sequence of interest, (ii) the 
polynucleotide comprises at least one transcriptional control element, and (iii) 
10 expression o! said coding sequence of interest is induced in the liver of the living, 
transgenic rodent by dexamethasone or rifampicin. 

17. An expression cassette comprising a polynucleotide derived from the 
mouse CypSAll gene, said polynucleotide having at least 95% identity to nucleotides 
15 M 1,002 of SEQ ID NO: 12, said polynucleotide operably linked to a coding sequence 
of interest, wherein the polynucleotide comprises at least one transcriptional control 
element. 



18. An expression cassette comprising, a polynucleotide derived from the 
mouse Cyp3All gene, said polynucleotide comprising a polynucleotide having at 
least 95% identity to the sequence of SEQ ID N0:13 or fragments of at least about 
100 contiguous nucleotides of SEQ ID x\0:i3, said polynucleotide or fragments 
thereof operably linked to a coding sequence of interest, wherein the polynucleotide 
or fragments thereof comprise at least one transcriptional control element, 

19. The expression cassette of claim 18, wherein said polynucleotide 
comprises a first polynucleotide having 95% identity or greater to nucleotides 5104- 
621S of SEQ ID NO: 13 and a second polynucleotide having 95% identity or gi'eater to 
nucleotides 6792-9330 of SEQ ID NO: 13. 

20. The expression cassette of claims 18-19, wherein the coding sequence of 
interest is a reporter sequence. 
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21. The expression cassette of claim 20, wherein the reporter sequence 
encodes a light-generating protem. 

5 22. The expression cassette of claim 21, wherein the light-generating protein 

is a bio luminescent protem or a fluorescent protein. 

23. The expression cassette of claim 22, wherein the biokiminescent protein is 
luciferase. 

10 • 

24. The expression cassette of claim 22, wherein the fluorescent protein is 
selected from the group consisting of blue fluorescent protein, cyan fluorescent 
protein, green fluorescent protein, yellow fluorescent protein, and red fluorescent 
protein. 

15 

25. A vector comprising 

(a) the expression cassette of any of claims 18-24; and 

(b) a vector backbone. 

20 26. The vector of claim 25, wherein said vector backbone further comprises a 

selectable marker. 

27. A cell comprising an expression cassette of any of claims 18-24. 

-5 2S. A cell comprising a vector of claim 25. 

29. A transgenic rodent comprising the expression cassette of any of claims 

18-24. 

50 30. The transgenic rodent of claim 29, wherein said rodent is a mouse or a rat. 



A rodent whose liver comprises an expression cassette of any of claims 
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18-24. 

32. The rodent of claim 31, wherein said rodent is a mouse or a rat. 

.5 33 . A method for identifying an analyte that modulates expression of a 

reporter sequence, wherein expression of said reporter sequence is mediated by 
transcription control elements derived from mouse Cyp3 Al 1 gene, in a transgenic, 
living rodent, said method comprising 

administering to a transgenic, living rodent said analyte, wherein said 
1 0 transgenic rodent comprises an expression cassette of any of claims 20-24; and 

monitoring expression of said reporter sequence wherein an effect on the level ^ 
of expression of said reporter sequence indicates that the analyte affects mediated by 
transcription control elements derived from mouse Cyp3 Al 1 gene. 

15 34. A method for identifying an analyte that modulates expression of a 

reporter sequence, wherein expression of said reporter sequence is mediated by 
transcription control elements derived from mouse Cyp3 Al 1 gene, in a living rodent, 
said method comprising 

adnoinistering to a living rodent a vector mixture comprising an expression 
20 cassette of any of claims 20-24, 

administering to said rodent said analyte; and 

monitoring expression of said reporter sequence wherein an effect on the level 
of expression of said reporter sequence indicates that the analyte affects mediated by 
transcription control elements derived from mouse Cyp3Al 1 gene. 

25 

35. The method of claim 34, wherein administering the vector mixture to the 
rodent comprises intravenous injection of said vector mixture. 

36. A method for monitoring expression of a reporter sequence in a cell, 
30 wherein expression of said reporter sequence is mediated by transcnpEjon control ' 

elements derived from mouse Cyp3 Al 1 gene, said method comprising: 

monitoring the expression of a reporter sequence .in a cell, said cell comprising 
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an expression cassette of any of claims 20-24, wherein expression of said reporter 
sequence in tlie c^ll is indicative of expression mediated by transcription control 
elements derived from mouse Cyp3 Al 1 gene. 



75 



wo 02/088305 PCT/US02/I J770 

1/60 

FIGURE lA (Page 1 of 12) 



GGTACCTGGT ATCTGTCCAG AAATTCATCC ATTTCATCCA GGTTTTCCAG TTTTGTTGAG 60 
TATAGC-TTTT TGTAGAAGGA TCTGATGGTG TTTTGGATTT CTTCAGGATC TGTTGTTATG 120 
TCTCCCTTTT CATTTCTGAT TTTGTTAATT AGGATGTTGT CCCTGTGCCC TCTAGTGAGT 180 
CTAGCTAAGG GTTTATCTAT CTTGTTGATT TTCTCA^^AGA ACCAACTCCT CGTTTGGTTA 24 0 
ATTCTTTTAA TAGTTCTTCT TGTTTCCACT TGGTTGATTT CACCCCTGAG TTTGATTATT 300 
TCCTGCTGTC TA^TCCTCTT GGGTGAATTT TCTTCCTTTT TTTTCTAGAG CTTTTAGATG 3 60 
TGTTGTCAAG CTGCTAGTGT ATGCCCTCTC CAGTTTCTTC TTGGAGGCAC TCAGAGCTAT 42 0 
GAGTTTCCCT CTTAGAAATG CTTTCATTGT GTCCCATAGG TTTGGGTATG TTGTGGCTTC 4 80 
GTTTTCATTA AACTCTAAAA AGTCTTTAAT TTCTTTCTTT ATTCCTTCCT TGACCA^^GGT 54 0 
ATCATTGAGA AGAGTGTTGT TCAGTTXCCA TTTGAATGTT TGCTTTCCAT TATTTAA.TGT 600 
TGCCTTAGTC CATGGTGGTC TGTGTCTTAG TCAGGGTTTC TTTTCCTGCA CAAACATCAT 660 
GACCAAGAAA CAAGTTGGGG ATGAAAGGGT TTATTCAGCT TACACTTCCA TGCTQCTGTT 72 0 
CATCACCAJUV GGAAGTCAGG ACTGGAACTC AAACAGATCA GGGAGCAGGA GCTGATGCAG 7 80 
AGGCCATGGA GGGATGTTCT TTACTGGCTT GCCTTCCCTG GCTTGCTCAG CCTGCTCTCT .8 4 0 
TATAGA^.TCC AAGACTACCA GCCCAGAGAT GGCACCACCC ACAA.GGGGCC TTTCCCCCTT 900 
GATCACTAAT TGAGAAAATG CCTTACAGTT GGATCTCATG GAGGCATTTC CTCAACTGAA 960 
GCTCCTTTCT CTGTGATATC TCCAGCTGTG TCAAGTTGAC ACAAAACTAG CCAGTACAAT 102 0 
TGACCCCTTG TCAACTTGAC ACACAJVA.CAC ATCACTAGTA ACCCTCAACC CTTACATTCT 108 0 
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TATTCATCCC CAAGATCTAA 
TTTCAATCTC TTTAAAATAT 
TCTCTTAACT ATGGCCTCCA 
ACAGTCAAAG CAAAAATCAA 
CTGGACTCCT CCAAGGGCTT 
ATCCTCTAGG TTfcCAGATGC 
TGGTACTGGC ATCTCCAAAA 
TGAGGCTGCA CCGTCACCAA 
TGTGCATGAC CCCTTCATGC 
CAAGTCCCAC TGCAGCAGGA 
TTTCAGAAAA CACTTCCCAG 
GATAATTTCT TAGCTCCAGC 
TAGTAGTTCT GGTATCTTGT 
CAGAATTTTC ACAATCAAAA 
AATTTCACAA GCCAGACCTC 
ACACAACATC TGACAGAGCT 
CCTTCCACAG TCCTCCCCAA 
CCW^TTTGTC TTAGTCAGGG 
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ATAACTTTAA AAGTCCCACA GTCTTTACAT ATTCTTAAAA 1140 
CCATCTCTTT TAAAATCCAA AGTCTTTTTA CAATTAAJVAC 12 00 
CTAAAACAGT TTCTTCCTTC AAGAGGGAAA ATATCAGGGC 12 60 
TCTCCAACCA TCCAATGTCT GGGATCCAAC TCACAATCTT 13 20 
GTGTCACTTC TCCAGCCATG CCCTTTGTAG CACAGGTGTC 13 8 0 
CTGTACTCCA CTGATGCTGC TGCTCTTGGT GGTCATCTCA 1440 
CACTGCATGG CCCCTTCAGT CCTGGGCCTT CAATTGCAAC 15 00 
TGGCCTTCCA TGCCCTCTCA CAGTGCCGAG CCTCAGCTGC 15 60 
CTTCAAJUVCC AGTACCACCT GGGTGACCCT TATACATTAC 162 0 
GTACAACCTT GGCTATCTCT GGAACACAGC CTCTTTGTGC 16 80 
AAGATGTCAC CTCAACGACG CTGGTCTCTT CTTAATCACC 174 0 
TAACCAGCAT CAATAGTCAT AGTAATGCAA GGTTTTGCTT 18 00 
TAATCACAGT TGATTCTTCA GCCCCAGCTA ACCAGAACTA 18 60 
CAGCAATGGC CCTGAAAAGA GTCTTTAATT TTCCCTCTGA 192 0 
CATCTTCTGC ACTGTTCTCA ACATTATCTT CT^AGCTCCT 19 8 0 
CTTAAC7L?i.TG AACGGATCTT CAAGCCGAAA GTTCCAAAGT 2 04 0 
AACATGGTCA GGTTGTCACA GG.AATACCCC ACTCCTGGTA 2100 
TTTCTATTCC TGCACAAACA TCATGGCCAA Gk^^hhGTT 216 0 
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GGGGAGGAAA. GGGTTTATTT AGCTTACACT • TCCATGCTGC TGTTCATCAC CAAAGGAAGT 2220 
CAGGACTGGA ACTCAAACAG GTCAGGGAGC AGGAGCTGAT GCAGAGGCCA TGGAGGGATG 2280 
TTCTTTACTG GCTTGCCTTC CCTGGCTTGC TCAGCCTGCA CTCTTATAGA ATCCAAGACT 23 4 0 
ACCAGCCCAG AGATGGCACC ACCCACAAGG GGCCTTTCCC CCTTGATCAC TAATTGAGAA 24 00 
AATGCCTTAC AGTTGGATCT CATGGAGGCA TTTCCTCAAC TGAAGCTCCT TTCTCTGTGA 24 60 
TATCTCCAGC TdTGTCAAGT TGACACAAAA CTAGCCAGTA CAGTCTGATA GGATGCATGG 2 520 
GACAATTTCA ATATTTTTGT ATCTGTTGAG GCCTGTTTTG TGACCAATTA TATGGTTAAT 25 8 0 
TTTGGAGAAG GTTCCGTGAG GTGCTGAGAA GTA.TATCATT TTGTTTTAGG ATAA^J^TGTT 264 0 
CTGTAGATAT CTGTCAAATC CATTTGTTTC ATCACTTCTG TTAGTTTCAC TGTGTCCTGT 27 00 
TTAGTTTCTG TTTTCATGAT CTGTCCACTG ATGAAAGTGG TGTGTTGAAG TCTCCCACTA 27 60 
TTATTGTGTG AGGTGCAATG TGTGCTTTGA GCTTTACTAA AGTGTCTTTA ATGAATGTGG 282 0 
CTGCCCTTGC ATTTGGAGCA TAGATATTCA AAATTGAGAG TTCCTCXTGG AGGATTTTAC 2 880 
CTTTGATGAG TATGAAGTGT CCCTCCTTGT CTTTTTTGAT AACTTTGGTT TGGA.hGTTGA 294 0 
TTTTATTTGA TATTAGAATG GCTACCCCAG CTTGTTTCTT CAGACCATTT GCTTGGAAAA 3 000 
TTGTTTTCCA GCCTTTCA.CT CTGAGGTAGT GTCTGTCTTT TTCCCTGAGA TGGGTTTCCT 30 6 0 
GTAAGCAGCA GAATGTTGGG TCCTGTTTGT GTAGCCAGTC TGTTAGTCTA TGTCTTTTTA 312 0 
TTGGGGAATT GAGTCCATTG ATATTAAGAG ATATTAAGGA ^^JU^^GTAATTG TTGCTTCCTA 318 0 
TTATTTTTGT TGTTAGAGTT GGCATTCTGT TCTTTTGGCT GTCTTCTTTT TGGCTTGTTG 324 0 
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AGGAATTACT TTCTTQCTTT TTCTAGGGCG TGATATCTGT CCTTGTATTT TTTTTTCTGT 3300 
TATTATCCTT TGAAGGGCTG GATTCTGGAA AGATAATGTG TGAATTTGGT TTTGTCATGG 3360 
AATACTTTGG TTTCTCCATC TATGGTAATT GAGAGTTTGG CCGGGTATAG TAGCCTGGGC 3420 
TGGCTTTTTT TTGTTCTCTT AGTGTCTGTA TAACATCTGT CCAGGCTCTT CTGGCTTTCA 34 80 
TAGTCTCTGG TGAAAAGTCT GGTGTAATTC TGATAGGCCT GCCTTTATAT GTTACTTGAC 3 540 
CTTTCTCCCG TiVCTGCTTTT AATATTCTCT CTTTATTTAG TGCATTTGTT GTTCTGATTA 3 600 
TTGTGTGTTG GGAGGAATCT CTTTTCTGGT CCAGTCTATA TGGAGTTCTG TAGGCTTCTT 3 660 
GTATGTTCAT GGGCATGTCA TTCTTTAGGT TCGGGAAGTT TTCTTCTATA ATTTTGTTGA 3 720 
AAATATTTGC TGGCCCTTTA A.GTTGAAAAT CTTCATTCTC ATCAACTCCT ATTATCTGTA 3 7 80 
GGTTTGGTCT TCTCATTGTG TCCTGGATTT CCTGGATGTT TTGAGTTAGG ACCTTTTTGT 3 840 
GTTTTGTATT ATCTTTGATT GTTGTCCTGA TGTTCTCTAT GGAATCTTCT GCACCTGAGA 3 90 0 
TTCTCTCTTC CATCTTTTGT ATCCTGTTGC TGATGCTCAC GTCTATGGTT CCAa^TTTCT 3 9 60 
TTCCTAGAGT TTCTATCTCC A.GCGTTGCCT CACTTTGGGT TTTCTTTATT GTGTCTACTT 4 02 0 
CCCTTTTTAG GTCTAGTATG GCTTTGTTCA TTTCCATCAC CTGTTTGGAT GTGTTTGCCT 4 0 80 
GTTTTTCTAT GAGGACTTCT ACCTGTTTGG TTGTGTTTTC CTGCTATTCT TTAS.GGATTT 414 0 
GTAACTCTTT AGCAGTGGTC TCCTGTATTT CTTTA?>.GTGA GTTATTAAAG TCCTTCTTGA 42 00 
TGTCCTCTAC CATCATCATG AGATATGCTT TTAAATACAG GTCTACCTTT ACGGTTGTGT 4 2 60 
TGGGGTGCCC AGGACTAGGT GGGGTGGGAG TGCTGCATTC TGATGATGGT GAGT3GTCTT 43 20 
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GATTTCTGTT AGTAGGATTC TTACGTTTTC CTTTTGCCAT CTGGTAATCT CTGGAGTTAT 43 80 
TTGTTATAGT AGTCTCTGGT TAGAGCTTGT TCCTCAGGTG ATTCTGTTAT GCTCTATCAG 444 0 
CAGACCTGGG AGACTAGCTC TATCCTTAGT TTCAGTGGTC AGAGTACTCT CTGCAGGCAA 4500 
GCTCTCCTCT TGCAGGGAAG GTGCCCAGAT ATCTGGTGTT TGMCCTGCC TCCTGGCAGA 45^0 
AGTTGTGTTC TACTCACCAT AGGTCTTAAG ATCCCATGGT TGGTCCTGTG TGGTTCCTTG 462 0 
CGTGTGTCCG GAGACTCCCC GGGCCAGGGT CCCTGGTGAT TGGAAGGGAC TTGTGCACCG 468 0 
GATCAGGCCA GGTTATCTGA TTCCTTAATT AATGCAGTCT CAGGTCCCGT GCQATTGAAT 474 0 
TGGAGCAGGC GCTGTGTTCC ACTCACCAGA GGTCTTAGGA TCCTGTGGAG GATCCTGTGT 48 00 
GGGTCCTTGC GGGTGTCTGC AGACTCCCCG GGCCAGGGAC CATGGTGCTG CAGTGGGCCG 4 8 60 
GAAGGGACTT GAGCCCTGGA TCATGCCGGA TTATCTGCTT CCTTAATTAA TGCAGTCTCA 492 0 
GGTCCTGGCG ATTGGATTGG AGCAGGCGCT GTGTTCCACT CACCAGAGGC CTTAGAATCC 49 8 0 
CGTGGCGGAT CCTGTGTGGG TCCTTATGGG TGTCCGCAGA CTCCCCGGGG CTAGGGACCA 504 0 
CGGTGCTCCA GTGGGCCGGA AGGGACTTGA GCCCCGGATC AGGCCGGATT ATCTGCTTCC 510 0 
TTAATTCCTG ATAGTCTTTT A.AAAGTAAAC TTATAGTTAG ACACTGTACA CAGGTATATA 5160 
ATACATTTTA AATATTCTCT CACTATGCCA GGTGGTATCA TATA-AG.A5\CT TTTGAATATA 52 2 0 
TTTCTTAAAG ATTAATTTTA ATATTTTATG CTCTTATACT ATGCTTAATT CCCA^AGAAT 52 8 0 
ATTTTGTATG TTTTGAAACA ATTTACTGTT CAACATTAiaA TATAGGATTC ACAGTTATAG 53 4 0 
ATAGTATTA,^. ATGTCCATTA ATGATATTTT TAGGGTAT.AA AAGGATATGA ATATA^\AAGT 54 0 0 
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GGCCATAAAG AATATATTCA TATGTATATA TATATGTGAA 5460 

TATAATTTTA AAAAGCAGCA GGTATCCCCC CCAAAATACA 5520 

TAGAACCTTG TCAAATGATA AACCAAAGAA ATACCAACTA 5580 

ATGGATTAGA GTCAGTGGAT TATTCAGGGT GTGGGAGCCT 5640 

CCAGACCCCC TAAAAAAGGT ATGCAGACCG TACAGCCATT 5700 

TCATTCAGCG GGACTCTGGG TACACATGGC TTGTGTGGGG 5760 

TGTTCATTCC TAAGCTGATA TACACACAAG CACATAiVGTA 5820 

TTGCT-TTGGG TGGGGGACAA GTATGTTTGG CAGGGGCTAA 5 880 

AGGGCTGTGG GAGAGACAGA GATAATAAAT NGATAGGGCC 5940 

TTTGTGCCAA GCAGTGTGAA TAGAGGCAAG TTCTAATGGT 6000 

TTTTATCCAT GGATTCGAAA GTGTTGGGAG TGGGATGGTA 6060 

AAGGAGGGTA GAAAAGGAGA CCAGGAGTGG GATGGTTGTG 612 0 

AGGTGGAACA GAAGGGAGCT GGGAGAGGTC AGAGTCCGTG 618 0 

AGAATGTGCT TATAAAACTA CAGAGACAAA GTTTGGAGCT 624 0 

CTAGAGACTG CCATATCCAG GGGATCCATC CCAT.^ATCAG 63 00 

TGCATACACT AGCAAGATTT TGCTGJ^JiP.QG ACCCAGATAT 63 60 

CTATGCTGGG GCCTAGCAAA CACAGAAGTG GATGCTCACA 642 0 

CAGGGCTCCC AATGGAGGAG CTAGAGATAG TACCCA4GGA 64 60 
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GCTAAAGGGA TCTGCAATGC' TATAGGTGGA ACAACATTAT GAACTAACCA GTACCCCGGA 6540 
GCTCTTGACT CTAGCTGCAT ATGTATCAAA AGATGGCCTA GTAGACCATC ACTGGAAAGA 6600 
GAGGCCCATT GGACACGCAA ACTTTATATT CCCCAGTACA GGGGAACGCC AGGGCCAAAA 6660 
AAACAAAAAA CAAAAAAAAA TGGGAATGGG TGGGTAGGGA AGTGTGGGGG AGGGTATGGG 6720 
GGACTTTTGG GATAGCATTG GAAATGTAAT TGAGGAAAAT ACGTAATAAA AAATATTAAA 67 80 
AAAAAACCTA CaVaGGACAG ACAGGCAACC ATTTTAGGAC AACCCTTGCT CCAGTTGTTA 6840 
GGGGACCCAT ATGAAGATAT ACCTTTATAT TTGTTACATA TCTGTGGGTG TTGGAGGATC 69 00 
TA-Z^GTCCAGC CCATCTATTC TCTTTGGTTG GTGGCTCCAT GAGAGCTCCC ACGGTTCTAG 6960 
GTTATTTGAC TGTTGGTCTC CCTGTGGAGT TCCTACCCAG TTTGGGGCCC TCA^AATTTT 702 0 
TCTCAGTTTT CTTCTCANAG CTTCTGAACT COVTCCAGTT TTTGGCTGTG AATATCTGCA 708 0 
TCTTCCTGAG TAAGCTTTTG GATAGAGCCT CTTAGAGGAC AACCATACTA GGCTCTTGTC 7140 
TCCAAGTTTA AATGTATCAT TAATAGTGTC AGAGATTGAT GCTTGCCCAT GGGATTGGTG 72 00 
TCAAGTTGQA CCAGTTAATG GTTGATCATT CCCTCAGTCT CTGCTTCATC TTTGTCCCTG 72 60 
CATTTCTTAT AAACAGACCA ATTTTTGTTT CAA^AGTTTT ATGAGTGGGT TGGTGTTTTT 732 0 
ATACCTCCAT TGGGGATCCT GCCTGATCCT GGGGAGATGG CCTCTTCAGG TTCCATATCC 73 8 0 
CCTTTACTAT GATTCTCTAC TAAGGTCATT TACATTGATA TCGGAGGTCT TTCTTTATTC 744 0 
TGGGTCTCTG GCTTCTCCTA GAGATGCCCC ^^J^.TCCCTCAC TCCTAGCAGC TGTAGATTTC 75 00 
TATTCACTCT CCTGGCCCTC TGGCTTTCAC TCCTGXCTCT TCCCTCACCA CATCCTGAAC 75 SO 
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CCCCATACTC CCTTCCTCCA CATTCATGGG TACATTTTTT Ai^JiTCCCAGA ACACAGAAGG 7620 
CAGAAGCAGG CAGATCTCTA CAAGTTTTAG GCAAGCCTGG TCTATAGAGC AAATTTCAGG 7680 
ATGGCCAGGG CTACACAGTG AAACTCTATC TTAAAAAACA AAAAAACAAA ATAAGTTATT 7740 
TATTACATAT TTACTTGTTT ATATGTAAGC ATATATGTGT GGGGGCTGAA GAGACCAGAA 7800 
GACAAGTTGT GGAAATTCAT TCTTCTGTTC CATCACATAG ATGCTGGGAA TTAAAATCAG 78 60 
GTTGTCGGGT TT^GAGACAG GTGACTTTGT TGTCTGAGCT TCCTTGAGAG CCTATAAGTT 7920 
TTTCTTTCAT TGTTAGTGTG CTAGCTGATA TCCACATTGT TTTCTGTGCT AGGTATCCTG 79 80 
AATTCCAGTT GAGTCCACAT GTCATGGAAT GTCCTCTTAC AACCTCTGCC ACTGGGTTTT 8040 
GTTTCCTACT ATTTAACTTA GGACTTTTTT TTTGGTAGXG ATTCTTACAA GAAAGGTACA 8100 
CATACATTTT TCTTTTTTGA GTTTGATTTG GATCAAGTTA TAATCGTGCA AGTCATGGTG 8160 
CCCTTCTTAC TAAGTCTCTA GGTTGCTATG GCTTTGTGAA AACTTTTGGA TTTTATCCTA 82 20 
AAAAAATAAT AATTAAAAAA AAATCCAGTA ACAATCACTT TGTGCACATT TATTCCTAAG 82 8 0 
CTATA^.GTTT CCACTTCTGT A.ACGTAGGTA TTTGAGATTG AAGAAGAJ^AT CTTTATGTGT 8340 
ATGGGTCTCT TGCTGGCATG CATATCCTTG CACTATGTGT ATATCTGGGT GCCTGTGAAG. 8400 
GCCAAATTAT GACTACAAAA ACCCAGGAGC TGGAGCTA^^J^ GACCATTGTG AGCCACCAGA 8460 
AGGGTACTGG GAATTGAZ^TC CAGGTCCTTT ACAGCAGTGG ACAATAGATG TT.^l^CTGCTG 852 0 
AGCCATATCT TTAGCTCTAA CATGGGGACA ATAGCTTACT TATCCCTAGG ACTTATCATG 858 0 
AGGACCCCA,^. AGAGAGTQ?J-.. .^.-.GTACTTAT AAGATATGAT GTCTTATCCT CTAGAGCAAG 8 64^0 
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CAAGGTGAAG CCCCTCTTTG AGTATCTTTC CTGGGACACC AAGTCTCAGC AGAACTAGGT 9780 
ATATCCTCTC CCACTGAGGC CATCCTGGGT AGTCCAGATA TGGGAJi.GGGG ATCTTATGGC 9840 
ATGCAAAAGT CAGAGACAGT CCCTGCTTCA ATTGTTGGGG GACCATTATG AAGACCAAGC 9900' * 
TGTGCATCTG TTATATAAGT TTAGGGGCCC TAGGTCCAGC CCCTTCATAC TGTTTGGTTG 9960 
GTGGTTCAGT TTTTGTCAGT CCCATAGTTT CAGGTTTGTT GACTGTAGAT TTTCCTGTGG 10020 
TGTCCTTGAC CCCTCTGGCT CACTCAATCC TATCTGTCAC CGTTCCACAA GAATCCTTGG 10080 
GCTTCCTGTG AAGTTTGAOT GTGGCTGGCT ACATTCCATA GCTAATTTTT AAATTCAATC 10140 
TCTCTCTCTC TCTCTCTCTC TCTCTCTCTC TCTCTCTCTC TCTCTCTCTC TCTGTGTGTG 10200 
TGTGTGTGTG TCTGACAACT GTATGTGTGT ATAGAATGCA TTCTGATTAA ATTTTCCCCA 102 60 
CTCTCTTACC CTATCCCTAT CAGCCTCTGT CCTTCCCATA TTCATGACTT GTTTTGTGTT 10320 
CTGAAGACTT TAGTGCCATC TGTGTGACTG TGGGTTTGGA ACTATCCACT AGAGCCTGTG 10380 
GGTCACCAGT CACCAGGGGA TCACAACTGA GTACAATACC TCCTTCTTTC TCAGCAGTGA 10440 
GTGGCAGGGC TTGTTTCTTC CTCACTGACA CAATTGTCAA GGGATGGCAG GTGTTGGATT 105 00 
TTTGTGGATA GATGTAGTAG AGTATTTTTT GAGACATGTA CTCCCTTATC CK^TGCTTGT 10560 
CTCAAACACT ATTTTGCTTT GAACTTTGTC TGTGAACTTC TGATTCCCCT GCTTCTACTG 1062 0 
TCTGAGTGTA TTTTTGAATG AJi.GCCAGCCT TGGTGAGAGG GTATTTGTTG TTGAhTTTGC 10 68 0 
TTGAATTTCT TATAAAAACC AAGAACTTTT ACCCATCTGG CACTGTTGTT TACTGATGCC 10 74 0 
ACACAGAATG TTAGCTCAAA GTAGGTCAAG TTGGGCTGTG GATGAACTAT ACGA^.CTGCC 10 61,0 0 
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TAGAGAAGAG AGTACCAAAG TCCAGTGATG CAAAGGTGAT CCATCTACTG GCTTGATCCC 10860 
TGGTGCCGCC CATTCTCCCA GCATATAATT ACTGCAGGCT GTCCTCAGTG CAGCAGAGTG 10920 
GGCAGAGGGA AGCATTGAGG AGGATCACAC ACACAGTTGT AGGGAGAACA CAGAGAAGTA 10980- 
AATTGCTGAC AAACAAGCAG GGATGGACCT GGTTTCAGCT CTCTCACTGG AAACCTGGGT 11040 
GCTCCTAGCA ATCAGCTTGG TGCTCCTCTA CCGGTAAGTG ATCTTTACAT TTCCTTCCCA 11100 
TACCATGTCT TCS^GGATCAG GGTGATACTC AGACATCTAT TCTGTTATTA TTGGGAGGCT 11160 
CAAAATGATT ATCAGAACCA GCAGCTGGAG AGCCGATGGC TCAGTGGTTA AGGTCACTTG 1122 0 
CTGCTCTTTC AGAGTACTCA AGTTTTAAGC CCAACATCCA CAAGCAGCTC AGAATCATCT 1128 0 
GTAACTATAG CTCCAGGGAA TCTGACACCT TCCACAGGCA TAGTTAGTAT GGTATTTAAT 1134 0 
GGTGGTAGCT TTTGTAACCT GGCTAGCTCC TAAATAATTG GGACAGAGAC CTATTAAGTT 1140 0 
TATTAGCAAT TTTTAAGCAC TATGATTGGG CAGGTTCAAA GCTGTTTTAG CCCACAAAGC 11460 
TATCTACATC CCAGCTATAG GCTCAGTTTT ACTTGCACTG TGACTGTTTC CCTGGCTTGC 1152 0 
TCTGCTCCAT GTGTGTCCXC ATGGTGAGCT CCTTTGATGA CTCCTTCCCA TGTCTGACCT 1158 0 
CATGGGAACC TTCTTCTTCC TCCACCTTCT TCTGGCCCTT CTGCTCCTAG ACCCTCATGG 1164 0 
GCCTTGTGGC CAACAACTTC TCTTCTGCCC AGTCATTTGA TCTTCAGTTT ATTATCCACC 11700 
AATCAGAGAT AATTGGGGAA CATTCTTTAT ACCACATTGA TATAGGAGAT TCCTCATTAG 117 6 0 
TCATGACAAT ACAGTCCAGA CTGTATCGAT GTCTCAGGTT ACAG.AAACCA GCATCTGAAT 11820 
ACACAGAGTG A?„AGACCCTC CTCCA-Z^CAGA GAGCAGAAGT TGA^iATTAAG TCTTCC?_A.^LA 11880 
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AGTTTTCGA;^. ATTTCATGTT TTTATTTATT GTGGTTAGGG ACAGCGCATG TGAGTGTGTG 11940 
TGTGTGTGTG TGTGTGTGTG TGTGTGTGTG TGTGTGTATT TGTGTATGCA GGTCAGAGGA 12 000 
CAACTTTTCA GAGAGTTCTC TCCTCTCATG TTGGTCCTGA AGACCAAACT CAGATTATCA 120&0 
ACATTATCCA TCAATGCCTT TACTTGTGGA GTCATCTCAA AGGTCCAAGA TGAAATGAGG 12120 
ACTGAGTTAA TTTTGCATTT TAATGTTTTG GCAGTATGGA GGATCAAGTC AGAGTTTATA 12180 
TATGCTAGGC ACACTCTTCA CTTCTTAGCT ATATTCCCAG TGGTACTAAC TCTTATTAAA 12240 
GCTCATACTG ATGTTCTGCA GATCTTTTGG GTACC 12275 
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GGTACCTGGT ATCTGTCCAG AAATTCATCC ATTTCATCCA GGTTTTCCAG TTTTGTTGAG 60 
TATAGCTTTT TGTAGAAGGA TCTGATGGTG TTTTGGATTT CTTCAGGATC TGTTGTTATG 120 
TCTCCCTTTT CATTTCTGAT TTTGTTAATT AGGATGTTGT CCCTGTGCCC TCTAGTGAGT 180 " 
CTAGCTAAGG GTTTATCTAT CTTGTTGATT TTCTCAAAGA ACCAACTCCT CGTTTGGTTA 240 
ATTCTTTTAA TAGTTCTTCT TGTTTCCACT TGGTTGATTT CACCCCTGA^ TTTGATTATT 300 
TCCTGCTGTC TA^TCCTCTT GGGTGAATTT TCTTCCTTTT TTTTCTAGAG CTTTTAGATG 360 
TGTTGTCAAG CTGCTAGTGT ATGCCCTCTC CAGTTTCTTC TTGGAGGCAC TCAGAGCTAT 420 
GAGTTTCCCT CTTAGAAATG CTTTCATTGT GTCCCATAGG TTTGGGTATG TTGTGGCTTC 480 
GTTTTCATTA AACTCTAAAA AGTCTTTAAT TTCTTTCTTT ATTCCTTCCT TGACC?AGGT 540 
ATCATTGAGA AGAGTGTTGT TCAGTTTCCA TTTGAATGTT TGCTTTCCAT TATTTAATGT 600 
TGCCTTAGTC CATGGTGGTC TGTGTCTTAG TCAGGGTTTC TTTTCCTGCA Caj^J-.CATCAT 6 60 
GACCAAGAAA CAAGTTGGGG ATGAAAGGGT TTATTCAGCT TACACTTCCA TGCTGCTGTT 720 
CATCACCJ^AA GGAAGTCAGG ACTGGAACTC AAACAGATCA GGGAGCAGGA GCTGATGCAG 7 80 
AGGCCATGGA GGGATGTTCT TTACTGGCTT GCCTTCCCTG GCTTGCTCAG CCTGCTCTCT 840 
TATAGAATCC AAGACTACCA GCCCAGAGAT GGCACCACCC ACAJ^.GGGGCC TTTCCCCCTT 900 
GATCACT.aAT TGAGAAru\TG CCTTACAGTT GGATCTCATG GAGGCATTTC C7C.:^;;CTGAA 960 
GCTCCTTTCT CTGTGATATC TCCAGCTGTG TCAAGTTGAC ACr^J^KACThG CCAGTACAAT 1020 
TGACCCCTTG TCAACTTGAC ACACAAACAC ATCACTAGTA ACCCTCAACC C-TACATTCT 1080 
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TATTCATCCC CAAGATCTAA ATAACTTTAA AAGTCCCAC^. GTCTTTACAT ATTCTTAAAA 114 0 
TTTCAATCTC TTTAAAATAT CCATCTCTTT TAAAATCCAA AGTCTTTTTA CAATTAAAAC 1200 
TCTCTTAACT ATGGCCTCCA CTAAAACAGT TTCTTCCTTC AAGAGGGAAA ATATCAGGGC 1260 
ACAGTCAAAG CAAAAATCAA TCTCCAACCA TCCAATGTCT GGGATCCAAC TCACAATCTT 1320 
CTGGACTCCT CCAAGGGCTT GTGTCACTTC TCCAGCCATG CCCTTTGTAG CACAGGTGTC 138 0 
ATCCTCTAGG Tt6cAGATGC CTGTACTCCA CTGATGCTGC TGCTCTTGGT GGTCATCTCA 1440 
TGGTACTGGC ATCTCCAAAA CACTGCATGG CCCCTTCAGT CCTGGGCCTT CAATTGCAAC 1500 
TGAGGCTGCA CCGTCACCAA TGGCCTTCCA TGCCCTCTCA.CAGTGCCGAG CCTCAGCTGC 1560 
TGTGCATGAC CCCTTCATGC CTTCAAAACC AGTACCACCT GGGTGACCCT TATACATTAC 1620 
CAAGTCCCAC TGCAGCAGGA GTACAACCTT GGCTATCTCT GG.aj^CACAGC CTCTTTGTGC 1680 
TTTCAGAAAA CACTTCCCAG AJ^GATGTCAC CTCAACGACG CTGGTCTCTT CTTAATCACC 1740 
GATAATTTCT TAGCTCCAGC TA^^.CCAGCAT CAATAGTCAT AGTAATGCAA GGTTTTGCTT 18 00 
TAGTAGTTCT GGTATCTTGT TAATCACAGT TGATTCTTCA GCCCCAGCTA ACCAGAACTA 18 60 
CAGAATTTTC ACAATCAAAA CAGCAATGGC CCTGAAAAGA GTCTTTAATT TTCCCTCTGA 192 0 
AATTTCACAA GCCAGACCTC CATCTTCTGC ACTGTTCTCA ACATTATCTT GTAAGCTCCT 19 80 
ACACAACATC TGACAGAGCT CTTAACAATG AACGGATCTT CAAGCCGAAA GTTCCAAAGT 2 04 0 
CCTTCCACAG TCCTCCCCAA AACATGGTCA GGTTGTCACA GGAATACCCC ACTCCTGGTA 210 0 
CCAATTTGTC TTAGTCAGGG TTTCTATTCC TGCACAA„ACA TCATGGCCA-A G>-AJ^.CAAGTT 2160 
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GGGGAGGAAA GGGTTTATTT AGCTTACACT TCCATGCTGC TGTTCATCAC CAAAGGAAGT 2 220 
CAGGACTGGA ACTCAAACAG GTCAGGGAGC AGGAGCTGAT GCAGAGGCCA TGGAGGGATG 22 80 
TTCTTTACTG GCTTGCCTTC CCTGGCTTGC TCAGCCTGCA CTCTTATAGA ATCCAAGACT 2340 
ACCAGCCCAG AGATGGCACC ACCCACAAGG GGCCTTTCCC CCTTGATCAC TAATTGAGAA 24 00 
AATGCCTTAC AGTTGGATCT CATGGAGGCA TTTCCTCAAC TGAAGCTCCT TTCTCTGTGA 24 60 
TATCTCCAGC TGtGTCAAGT TGACACAAAA CTAGCCAGTA CAGTCTGATA GGATGCATGG 252 0 
GACAATTTCA ATATTTTTGT ATCTGTTGAG GCCTGTTTTG TGACCAATTA TATGGTTAAT 258 0 
TTTGGAGAAG GTTCCGTGAG GTGCTGAGAA GTATATCATT TTGTTTTAGG ATAAAATGTT 2 64 0 
CTGTAGATAT CTGTCAAATC CATTTGTTTC ATCACTTCTG TTAGTTTCAC TGTGTCCTGT 27 0 0 
TTAGTTTCTG TTTTCATGAT CTGTCCACTG ATGAAAGTGG TGTGTTGAAG TCTCCCACTA 2 7 60 
TTATTGTGTG AGGTGCAATG TGTGCTTTGA GCTTTACT^VJV AGTGTCTTTA ATGAATGTGG 2 82 0 
CTGCCCTTGC ATTTGGAGCP. TAGATATTCA AAATTGAGAG TTCCTCTTGG AGGATTTTAC 2 880 
CTTTGATGAG TATGAAGTGT CCCTCCTTGT CTTTTTTGAT AACTTTGGTT TGGAAGTTGA 2 94 0 
TTTTATTTGA TATTAGAJVTG GCTACCCCAG • CTTGTTTCTT CAGACCATTT GCTTGGAAAA 3 000 
TTGTTTTCCA GCCTTTCACT CTGAGGTAGT GTCTGTCTTT TTCCCTGAGA TGGGTTTCCT 3 060 
GTAAGCAGCA GAATGTTGGG TCCTGTTTGT GTAGCCAGTC TGTTAGTCTA TGTCTTTTTA 312 0 
TTGGGGAATT GAGTCCATTG ATATTAAGAG ATATTAAGGA AA-^GTAATTG TTGCTTCCTA 318 0 
TTATTTTTGT TGTTAGAGTT GGCATTCTGT TCTTTTGGCT GTCTTCTTTT TGGCTTGTTG 3 24 0 
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AGGAATTACT TTCTTGCTTT TTCTAGGGCG TGATATCTGT CCTTGTATTT TTTTTTCTGT 3300 
TATTATCCTT TGAAGGGCTG GATTCTGGAA AGATAATGTG TGAATTTGGT TTTGTCAT?3G '33 60 
AATACTTTGG TTTCTCCATC TATGGTAATT GAGAGTTTGG CCGGGTATAG TAGCCTGGGC 342 0 
TGGCTTTTTT TTGTTCTCTT AGTGTCTGTA TAACATCTGT CCAGGCTCTT CTGGCTTTCA 348 0 
TAGTCTCTGG TGAAAAGTCT GGTGTAATTC TGATAGGCCT GCCTTTATAT GTTACTTGAC 354 0 
CTTTCTCCCG TACTGCTTTT AATATTCTCT CTTTATTTAG TGCATTTGTT GTTCTGATTA 3 60 0 
TTGTGTGTTG GGAGGAATCT CTTTTCTGGT CCAGTCTATA TGGAGTTCTG TAGGCTTCTT 3 660 
GTATGTTCAT GGGCATGTCA TTCTTTAGGT TCGGGAAGTT TTCTTCTATA ATTTTGTTGA 3 72 0 
AAATATTTGC TGGCCCTTTA AGTTGAAAAT CTTCATTCTC ATCJ^ACTCCT . ATTATCTGTA 3 780 
GGTTTGGTCT TCTCATTGTG TCCTGGATTT CCTGGATGTT TTGAGTTAGG ACCTTTTTGT 3 840 
GTTTTGTATT -ATCTTTGATT GTTGTCCTGA TGTTCTCTAT GGAATCTTCT GCACCTGAGA 3 900 
TTCTCTCTTC CATCTTTTGT ATCCTGTTGC TGATGCTCAC GTCTATGGTT CCAGATTTCT 3 960 
TTCCTAGAGT TTCTATCTCC AGCGTTGCCT CACTTTGGGT TTTCTTTATT GTGTCTACTT 4 020 
CCCTTTTTAG GTCTAGTATG GCTTTGTTCA TTTCCATCAC CTGTTTGGAT GTGTTTGCCT 4 08 0 
GTTTTTCTAT GAGGACTTCT ACCTGTTTGG TTGTGTTTTC CTGCTATTCT TTA-AGGATTT 414 0 
GTA^vCTCTTT AGCAGTGGTC TCCTGTATTT CTTTAAGTGA GTTATT.^i^AAG TCCT^TCTTGA 4 2 00 
TGTCCTCTAC CATCATCATG AGATATGCTT TTAAATACAG GTCTACCTTT ACGGTTGTGT 42 60 
TGGGG"GCCC AGGACTAGGT GGGGTGGGAG TGCTGCA.TTC TGATGATGGT GAGTGGTCTT 43 20 
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GATTTCTGTT AGTAGGATTC TTACGTTTTC CTTTTGCCAT CTGGTAATCT CTGGAGTTAT 43 80 
TTGTTATAGT AGTCTCTGGT TAGAGCTTGT TCCTCAGGTG ATTCTGTTAT GCTCTATCAG 4440 
CAGACCTGGG AGACTAGCTC TATCCTTAGT TTCAGTGGTC AGAGTACTCT CTGCAGGCAA 4500 
GCTCTCCTCT TGCAGGGAAG GTGCCCAGAT ATCTGGTGTT TGAACCTGCC TCCTGGCAGA 4560 
AGTTGTGTTC TACTCACCAT AGGTCTTAAG ATCCCATGGT TGGTCCTGTG TGGTTCCTTG 462 0 
CGTGTGTCCG GAGACTCCCC GGGCCAGGGT CCCTGGTGAT TGGAAGGGAC TTGTGCACCG 4 680 
GATCAGGCCA GGTTATCTGA TTCCTTAATT AATGCAGTCT CAGGTCCCGT GCGATTGAAT 4740 
TGGAGCAGGC GCTGTGTTCC ACTCACCAGA GGTCTTAGGA TCCTGTGGAG GATCCTGTGT 4 800 
GGGTCCTTGC GGGTGTCTGC AGACTCCCCG GGCCAGGGAC CATGGTGCTG CAGTGGGCCG 4 860 
GAAGGGACTT GAGCCCTGGA TCATGCCGGA TTATCTGCTT CCTTAATT^^J^ TGCAGTCTCA 4 920 
GGTCCTGGCG ATTGGATTGG AGCAGGCGCT GTGTTCCACT CACCAGAGGC CTTAGAATCC 49 80 
CGTGGCGGAT CCTGTdTGGG TCCTTATGGG TGTCCGCAGA CTCCCCGGGG CTAGGGACCA 5 040 
CGGTGCTCCA GTGGGCCGGA AGGGACTTGA GCCCCGGATC AGGCCGGATT ATCTGCTTCC 5100 
TTAATTCCTG ATAGTCTTTT AAAAGTAAAC TTATAGTTAG ACACTGTACA CAGGTATATA 5160 
ATACATTTTA AATATTCTCT CACTATGCCA GGTGGTATCA TATpJIGAACT TTTGAATATA 52 2 0 
TTTCTTAAAG ATTAATTTTA ATATTTTATG CTCTTATACT ATGCTTAATT CCCAAAGAAT 52 8 0 
ATTTTGTATG TTTTGJ^JVACA ATTTACTCTT CAACATTAITA TATAGGATTC ACAGTTATAG 53 4 0 
ATAGTATTAA ATGTCCATTA ATGATATTTT TAGGGTATAA JlAGGATATGA ATAXAAAAGT 5400 
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TGAACAAAAA AGAGGGGATG GGCCATAAAG AATATATTCA TATGTATATA TATATGTGAA 5460 
TAATTCAAAG AATAAATAAA TATAATTTTA AAAAGCAGCA GGTATCCCCC CCAAAATACA 5520 
GTTGTTGAAG TGCCTTGTGA TAGAACCTTG TCAAATGATA AACCAAAGAA ATACCAACTA 5580 
CCCACCCAGC CACCCAAGAG ATGGATTAGA GTCAGTGGAT TATTCAGGGT GTGGGAGCCT 5640 
GAGGATAAAA AATCAGAACC CCAGACCCCC TAAAAAAGGT ATGCAGACCG TACAGCCATT 5700 
TTATAGTTTT GThlTGAGCT TCATTCAGCG GGACTCTGGG TACACATGGC TTGTGTGGGG 5760 
GTGTGTTGAC AACCTGCAAG TGTTCATTCC TAAGCTGATA TACACACAAG CACAT.4AGTA 5820 
GCACTASATG GTCTGTGACC TTGCTTTGGG TGGGGGACAA GTATGTTTGG CAGGGGCTAA 5 880 
ATGATAGAAC CACTAAGTTT AGGGCTGTGG GAGAGACAGA GATAATAAAT NGATAGGGCC 5940 
CACATTTCAG GCAGTATACA TTTGTGCCAA GCAGTGTGAA TAGAGGCAAG TTCT.iUVTGGT 6000 
ATTGGCGAAG TGCTTGCATA TTTTATCCAT GGATTCGAAA GTGTTGGGAG TGGGATGGTA 6060 
ACTTGATCCC TCCAGGAGCA AAGGAGGGTA GAAAAGGAGA CCAGGAGTGG GATGGTTGTG 6120 
ACAGATCCCA GGGAAAAGCC AGGTGGAACA GAAGGGAGCT GGGAGAGGTC AGAQTCCGTG 618 0 
CAATAGCTCC TGGGCAAGGC AGAATGTGCT TATAa.AACTA CAGAGACAAA GTTTGGAGCT 624 0 
GTGACGa_2UvG GATGGACCAT CTAGAGACTG CCATATCCAG GGGATCCATC CCATAATCAG 63 0 0 
CXTCTAAACG CTGACACCAT TGCATACACT AGCAAGATTT TGCTGAAAGG ACCCAGATAT 63 60 
AGTTGTCTCT ATATGTGAGA CTATGCTGGG GCCTAGCAAA ChCAGPJkGTG GATGCTCACA 6420 
GTCAGCTATT GGATGGATCA CAGGGCTCCC AATGGAGGAG CTAGAGATAG TACCCAAGGA 648.0 



WO02/0H8305 . PCT/US02/11770 

19/fiO 



FIGURE IB (Page 7 of 9) 

GCTAAAGGGA TCTGCAATCC TATAGGTGGA ACAACATTAT GTdKCTPACCh GTACCCCGGA 6540 
GCTCTTGACT CTAGCTGCAT ATGTATCAAA AGATGGCCTA GTAGACCATC ACTGG?AAGA 6S00 
GAGGCCCATT GGACACGCAA ACTTTATATT CCCCAGTACA GGGGAACGCC AGGGCCAAAA 6S60 
AAACAAAAAA CAAAAAAAAA TGGGAATGGG TGGGTAGGGA AGTGTGGGGG AGGGTATGGG 6720 
GGACTTTTGG GATAGCATTG GAAATGTAAT TGAGGAAAAT ACGTAATAAA AAATATTAAA 6780 
AAAAAACCTA CATAGGACAG ACAGGCAACC ATTTTAGGAC AACCCTTGCT CCAGTTGTTA 6840 
GGGGACCCAT ATGAAGATAT ACCTTTATAT TTGTTACATA TCTGTGGGTG TTGGAGGATC 6900 
TAAGTCCAGC CCATCTATTC TCTTTGGTTG GTGGCTCCAT GAGAGCTCCC ACGGTTCTAG 6960 
GTTATTTGAC TGTTGGTCTC CCTGTGGAGT TCCTACCCAG TTTGGGGCCC TCAAAATTTT 702 0 
TCTCAGTTTT CTTCTCANAG CTTCT6AACT CCATCCAGTT TTTGGCTGTG AATATCTGCA 7080 
TCTTCCTGAG TAAGCTTTTG GATAGAGCCT CTTAGAGGAC i^ACCATACTA GGCTCTTGTC 7140 
TCaAAGTTTA AATGTATCAT TAATAGTGTC AGAGATTGAT GCTTGCCCAT GGGATTGGTG 72 00 
TCAAGTTGGA CCAGTTAATG GTTGATCATT CCCTCAGTCT CTGCTTCATC TTTGTCCCTG 72 60 
CATTTCTTAT AAACAGACCA ATTTTTGTTT CAAAAGTTTT ATGAGTGGGT TGGTGTTTTT 7320 
ATACCTCCAT TGGGGATCCT GCCTGATCCT GGGGAGATGG CCTCTTCAGG TTCCATATCC 73 8 0 
CCTTTACTAT GATTCTCTAC TA?-GGTCATT TACATTGATA TCGGAGGTCT TTCTTTATTC 744 0 
TGGGTCTCTG GCTTCTCCTA GAGATGCCCC AATCCCTCAC TCCTAGCAGC TGTAQATTTC 7500 
TATTCACTCT CCTGGCCCTC TGGCTTTCAC TCCTGTCTCT TCCCTCACCA C^^CCTCAAC 75 gO 
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CCCCATACTC CCTTCCTCCA CATTCATGGG TACATTTTTT AAATCCCAGA ACACAGAAGG 7620 
CAGAAGCAGG CAGATCTCTA CAAGTTTTAG GCAAGCCTGG TCTATAGAGC AAATTTCAGG 7680 
ATGGCCAGGG CTACACAGTG AAACTCTATC TTAAAAAACA AAAAAACAAA ATAAGTTATT 7740 
TATTACATAT TTACTTGTTT ATATGTAAGC ATATATGTGT GGGGGCTGAA GAGACCAGAA 78 00 
GACAAGTTGT GGAAATTCAT TCTTCTGTTC CATCACATAG ATGCTGGGAA TTAAAATCAG 7860 
GTTGTCGGGT TTGGAGACAG GTGACTTTGT TGTCTGAGCT TCCTTGAGAG CCTATAAGTT 7920 
TTTCTTTCAT TGTTAGTGTG CTAGCTGATA TCCACATTGT TTTCTGTGCT AGGTATCCTG 7980 
AATTCCAGTT GAGTCCACAT GTCATGGAAT GTCCTCTTAC AACCTCTGCC ACTGGGTTTT 804 0 
GTTTCCTACT ATTTAACTTA GGACTTTTTT TTTGGTAGTG ATTCTTACAA GAAA.GGTACA 8100 
CATACATTTT TCTTTTTTGA GTTTGATTTG GATCAAGTTA TAATCGTGCA AGTCATGGTG 8160 
CCCTTCTTAC TAAGTCTCTA GGTTGCTATG GCTTTGTGAA AACTTTTGGA TTTTATCCTA 8220 
AAAAAATAAT AATTAAAAAA AAATCCAGTA ACAATCACTT TGTGCACATT TATTCCTAAG 82 80 
CTATAAGTTT CCACTTCTGT AACGTAGGTA TTTGAGATTG AAGAAGAAAT CTTTATGTGT 834 0 
ATGGGTGTCT TGCTGGCATG CATATCCTTG CACTATGTGT ATATCTGGGT GCCTGTGAAG 84 00 
GCCAAATTAT GACTACAAAA ACCCAGGAGC TGGAGCTAAA GACCATTGTG AGCCACCAGA 84 60 
AGGGTACTGG GAATTGAATC CAGGTCCTTT ACAGCAGTGG ACP-ATAGATG TTA.:iCTGCTG 85 20 
AGCCATATCT TTAGCTCTAA CATGGGGACA ATAGCTTACT TATCCCTAGG ACTTATCATG 85 8 0 
AGGACCCCAA AGAGAGTGAA Aa.GTACTTAT AAGATATGAT GTCTTATCCT CTAGAGC/uAG 86'S.O 
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AAAGCCAGAG AGGAAATCCT GCTTTATTTT TTTTTTAGTA CTCATTGTCA GCTTGCTGGT 8700 
CTCCCTTACT TTGTCCCTGC TTAGAGGGAT GAGTGTGGGG TTTTTATTAC CC-.TTGGGGG 8760 
AACATCCCAA TTGGAATGAG GTGCTGGTTT CTCGACTAAT CCTGTATGAC ACCAAAGAAG 8820 
TATGAATCTG TTAAAGGTGA AAATTTTGCC ATCAACAACC CAACCTTCAT ACTT.AAGTCT 838 0 
CAGAGAATAC AGAGGAAGAG GGCCAGTAAT ATATTAAGAG TTAGAGGACT AGGAATTCTG 8 940 
CTCTCAGATG GxhTCTCCAA GAAATGGAGG CAGGACCAGA CACATTAAAT ATCAACAATC 9000 
TATACAAGAT ACAATGAAAT CTCAAATAGG CATGGTAAAG AATATATATA TATATAACAC 9060 
AATAATAATA ATCGCAAAGA AGCCATGAAT TTGATAGGGA GTTGCGAGAT GGGA^^^GAACT 9120 
GGAGGGAGGA GATGAAAGAA GATGATCTAA TTTCATTGTA GTTAATAATT TTAAAAGATG 9180 
AAGAACTTGA ACTTTAGAAC AACATGGTCT CTTGGATCCT GGTTTCATTA AGGATTTATT 9240 
ATGTAACCTT GATTGAATCA GTTATCATTT GGGGTATGGT TTGTTCACTT GTGACAGAGT 93 00 
TATCCCTCAC AACATTGCAG GGTAGATGAT 93 3 0 
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GGTACCTGGT TATCTATTGG GACTGGTTGG ACAAGAGGGT GCAGCCCACG GAGGGTGAGC 60 

CAAGCAGGGT GGGGCGTCGC CTCACCTGGG AAGCACAAGG GGTCGTGGAA TTTTCTCCCC 120 

TACCCAAGGA AAGCCATAAG GGACTGAGCC TGAGGAACTG TGCACTCTGG CCCAGATACT 180 

GCACTTTTCC CATGGTCTTT GCAACCCGCA GACCAGGAGA TTCCCTCCGG TCCCTATGCC 240 

ACCAGGGCCC TGGGTTTCAA GCACAAAACT GGGCAGCCAT TTGGGCAGAC ACCGAACTAG 300 

CTGCAGGAGT TTTTTTTTTT TTTTTCCATA CCCCATTGGC ACCTGGAACG CCAGTGAGAC 360 

AGAACCGTTC ACTCCCCTGG AAAGGGGGCT GAAACCAGGG ATCCAAGTGG TCTGGCTCGG 420 

TGGGCCCCAC CCCCATGGAG CCCAGCAAAC AAAGATTCAC TTGGCTTGAA ATTCTTGCTG 480 

CCAGCACAGC AGCAGTCTGA GATTGACCTG GGACCCTCGA ACTTGGTTGG GTGCTGTGGG 540 

GGGGCATCTX CCATTGCTGA GGCTTGAGTA GGTGGTTTTA CCTTCGCGGT GTAAACAAAG 600 

CTGCTGGGAA GTTTGAACTG GGTGGAGCTC ACCACAGCTC AGTAAGGCCA CTGTGGCCAG 660 

ACTGCCTCTC TGGATTTCTC CTCTCTGGGA AGGATATCTC TGAAAAAAAG GCAGCAGCCC 720 

CAGTCAGGGA CTTATAGATG AAACCCCCAT CTCCCTGGGA CAGAGCCCCT CGGGGAAGAG 780 

GTGGCTTCCA CCATTGTGGA AGACTGTGTG GCAATTCCTC ACGGATTTAG AACTAGAGAT 840 

ACCATTTGAC CCAGCAATCC CATTACTGGG TGTATACCCA TAGGATTATA AATCATTCTa' 900 . 

CTATAAP.GAC ACATGCACAC TTATGTTTAT TGTAACACTA TTTACAATAG CAi^.TGACCTG 960 

GAACCAATCC AAAAGCCCAT CAATGATAGA CTGAATAA;>.G AAAATGTGGC ACATATACAC 1020 

TGTGGMTAC TATGCAGCCA lA.^JmGGA TGAGTTCATG TCCTTTGCAG AGACATGGAT 108 0 
GAA.GCTGGAA ACCATCATTC TCAGCAAACT AGCACAAT^A CAGAAAACCA A.ACACTGCAT 1140 
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GTTGTCACTC ATAAGTGGGA GTTAAACAAT GAG.^CAT GGACACAGGG AGGGa^LACGT 1200 
CACACACTGG GGCATGTCGG GGAGTGGGGG CCTACGGGAG GGATAGCATT AGCAGi^JUVTA 1260 
CCTAATGTAG GTGACGGGTT GATGGGTGCA GCA.=ACCACC ATGGCACATA TACACCTATG 1320 
TAATAAAACT GCACGTTCTG CACATGTACC CCAGAACTTA AAGTATAATT AATAATAATA 1380 
ATAATTTCTG GGCATGTAAG TAGCTGTCTT TCAGGTTCTA CTTTGATACA TATTCTGAGA 1440 
GAATTAAACC TG^CAAAGAA ACCTTGACTT TCAATGGCAG GCACTGGAAT TGACCCTAAT 1500 
AAXGTGTTTT GGGGTAAGCC TACTCATATT CTCAACCTGT CTGCAGTAGT CGTTAGAATC 1560 
TGAACTTCCT GAAGTTCATG TGCAAAGTTG AGTTAATTGT TTAATATTCA ACA^^.GGATTA 1620 
TGCCAGTAAG ATGGTAGGAA AATATTAGAT ATGTGTCATC ACTGCTGGTA TTATTT^J^C 1&80 
TGCAACATAT TTTAGCTGGC TGCTGATCTC AGCCACCATG CCTGCATTTT ATCTCTGTCT 1740 
CGTGGTCTGC .^CCTTGGAA GCTTTGAACT TAGCTCATAG AATCCTGGGC ATCAAGAACA 1800 
TGTGGTTCTA ATGGCTAGAT AGGGAATGAG AGTAAAAGGA TTTTGCCCAC GGTCACGTGA 1860 
GTAAACA.ACA GATTTGGAGG GGTCTGGACT ACTGTGATGA CTTCATTCTG ACAATATGTT 192 0 
CCAGTTGTCC TTTCAITTCC TCCTAATCAC ATGTTTGGTC TGATTTGGCT GTTTCCCACC 1980 
TTCCAATTCC TGCCTXCTCC AATGCTCCCT TCCGTAGGTC ACTCTGTGGC TCAGAGACCC 2040 
TGCTTAGCA.A GCGCCCAACC TTTC.^TTAT TTGTTCAGTA A.AACTTGA.AC TCATGTCTCC 2100 
CCTTCTTGAT A.^.AAGAAAA TACGTTATGT AATGTCGGGT TACTCTATAA CTCTTGTCCT 2160 
GTCTCTCGGC .A..CTAGTGAA CT.^ACTGTTT TCATATTGAG C.^.CGTTTA TGGA.GGACT 2220 
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GCCAAGAGTC AGGTACTAGG CTTGGTAATA TTCCCCGTTC TCTCTAGTCA AAGCCAACAC 2260 
CAGCCAGACT TGCAGATCTA GGTCCCAAGC CCACTGCAGA TCACAGGCCA GGGTCTGGTC 2340 
TCCTCTGAGC TCCTTTGGGA GGGAAAGACA GAATTATTAA. CACCCATTTT GTAGATTAGG 2400 
■CAACTGAGGC TGAGGAAGTT TAAATAACTC AGACAGGGCC TGCACGTCAG TCATATTCCA 2460 
AGGATCCCTA CTCACTGTCT TCTCTCTACA GAACGAGATG TCTCTGGAGT CCATAGAAAG 2S20 
CCCAGGAGCC TGGCTGGGCA CGGTGGCTCC TGCCTGTAAT CCCAGCACTT TGGGAGGCCG 2580 
AGGCAGGCAG ATCACCTGAG CTCAGGAGTT CAAGACCAGC CTGGGCAACA TGGCAAAACC 2640 
CCATCTCTAC TAAAAATACA AAAAATTAGC TGGGCGTGGT GGTGCATGCC TCTAATCCCA 2700 
GCTACTTGGG AGGCTGAGGC ACAAGAATTG CTTGAGCCCA GGAGGCAGCA GTTGCAGTGA 2760 
GCTGAGATTG TGCCAGTGCA CTCCAGCCTG GGCAACAGAG CAAGATTCCA TTTCAAAAAC 2 82 0 
AAAAACAAAC ACAAACAA;..C AAACAAAAAT AGAAAGCCCA GGGACCACCT GCGTCAGGTT 2880 
CCCAGCCACA CCTTTTTCTT GTCCTCCTCT GTCTCTGGCA TCTTCTCACA GGTTCCTAAT 2 940 
TGTTTGTGGT TGCACAAATT CAAAATCCCA GAAAAATTAC CACTTCACAC CCACTCAGAT 3000 
GGCTATTTTT TTTTTGA.AGG PJ^GAT^J^.CPJ, GTGTTGACAA GAACATGGAG ;^J.J.TTGG.AAT 3060 
TCTCACCCAT TGCTGGTGAG AATGTAATAC GGTGCTGCTG pTATGGAAAA CAGCTTGGAG ' 312 0 
TTTCCTCAA:^. A_^GTTCaj..CA GAATTTCA.AT GTGACCCAGG AATTCCCCTC T.^.GTTATAG 318 0 
ATCTGAGAGG ATTAAAAACA GTTACTAA.nj^ TACACGGACX CACATATTTC TAACAGTCCA 3240 
ATTCAC.az.GG GCCAAAAGGT GCTA-^TAGCC CACATGTCCA TCGATGGATG C-ATA....TAA.A 3300 
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TTGTGGTCTA TCCATACAAT GGAATATTAT TCGGCCATAA ATGGAATGAA GTACTGACGC 33 60 
ATGCTACAGA ATGGATGAAC CGCAAAAAAA ATGGATGAAC ACATGCTACA GAi^TGGATAG 3420 
CCTCACTTTA CTATGAAGTG AAGGCCAGAA ACGAAGTCCA TATATTCCAT CATACAhAAT 34 80 
■ATCCAGA:^.GA GGGAAGCCCA CAGAGACAGA ATGTGCAATG GTGGATGCCA GGGTCTGGGG 3540 
AGAGGGGAGA GTGGGGAGAA ACTGCTCAAC TGGTACAGGC TTTATTTTGG AATGATGGGA 3600 
ACATTTTGCA ACTAGATAGA GGTAGTGATT GCAGAACACA GAATGTACTG AATTCCACTG 3660 
ATTTTTTTCA CCTTAAAATG GTTAATTTTC AGTCCTGAGA TTCGATAATC ATAAi^AAAT 3720 
GGTTAATTTT ATGTTATGTG AATTTCAT^C CTATACATAT TTTAAACCTC AGAA^.TATAC 3780 
ACTAGCAGGC ATGGAACAGG TCACTGTGGT GCCTGCCAAG CCCGGTGATG TTATCTGGGG 3 840 
TCCCCGGCCA GCCTTAAGCC TCTTGCTGAC CGGTGGAGGG CAGAACCTTT GCCCTAAAAG 3 900 
TATAATATCC ACATGCTGGC ATGATTCCTG GCCAGATGGC TTCTTTATTA GCAGTi^ATTG 3 960 
AAACTGCCTC GATACAGACA CTGTACCTTG CA=VCCAAAAA ATGACTCAAC AATGATAATA 4 02 0 
AGGGTTAZ^.GC TGGGCCTJtTC TCTCTTTGCC AGTTA^^TTA TATTTATTAT AGCTTGACAT 4080 
GAAAAACPAA GCAACTCCAA CAGGTATCAC AAGGGCAAAG GACATGAACA TTTTATCAAA 414 0 
GAAGAAATGC AGCTGTCAAA AATACAGAAA TATTCAACCT TGTTCATAAT A^Ji.GTGGCTG =42 00 
GGCTCAGTGG TTCATGCCTG TAATCCCAGT GCTTTGCAAG GCTGAGACAG GAGGATCATT 42 60 
TGAAGCCAGA AGTTCAAGAC CATCCTAGGC AAGTCAGTTC AATACCAGAC TTCATGTCTA 432 0 
CAAAA.CATaA A.AAAATTAGC CAGGCATGGT GATGCATGCC TGTTGTCCCA GCTACTCAGG 438 0 
AGGCTGAGGC AGGAGAATTG CTTGAGCCTG GGAGGCTGCG GTGGCGGTGA GCCATGATTG 4440 
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TGCCATTGTA CTCCAGCCTG GGCAATGCAG CAAGACTGTC TAAATAACAA AAATAATAGT 4500 
AAAGAAAAGG ATTGGGATGC CATTTACTTG CGTATTCAAT ACACAGAGTT AAAAGT.^TT 4560 
TCTACGTTTT CTATTTTTTT ATTACTAAAA AAAGCTGGAC CATTCTCACA GCCTGAAATG 4620 
CTTCTCACTT TCCCTTCTTC TGTCCAAACA CTTCTCTATG ATAATGCAAA CAGTCACTCC 4680 
TTTAGGAAGA CTTCACCCCA GGTAGTTCCA GATCCCCTTA TCTCTGCCTT CCCAG.^.CTC 4740 
CTGGTGTCTC TC9AGTTCCC TCCGTGTGGT GAAGTACCCT ACCTAGGGTT TCAGTATGGC 4800 
TCTGTCTGCA AAGGTCTTGT TCACACCTTC CCTTATGGTX CTGTTGCCCT GTGTTGTOTC 4860 
ATAGCACAGG GCACAGTGGA GAACCCATTC ACACTGATAG AaAGGGCCCC ATGGTCCTGG 4920 
AGATAACCAT GTAACCGATC AGAATAAGGC ATTGAGGGCT GGGTGTCAGG CGTGGGCTGC 4980 
ACTTGGGTGG GCAGGTCCCC TGGAJVAGTCA CTGGGTTTGG CA.AGCTTCCT AGTA.^.CATGT 5040 
CTCTCTGGGG TCCCCCTTGG AACTTCATGC AAAAATGCTG GTTGCTGGTT TATTCTAGAG 5100 
AGATGGTTCA TTCCTTTCAT TTGATTATCA AAGAAACTCA TGTCCCAATT AAAGGTCATA 5160 
AAGCCCAGTT TGTAAACTGA GATGATCTCA GCTGAATGAA CTTGCTGACC CTCTGCTTTC 522 0 ' 
CTCCAGCCTG TCGGTGCCCT TGAAATCATG TCGGTTCAAG CAGCCTCATG AGGCATTACA 5280 
AAGTTTAATT ATTTCAGTGA TTATTAAACC TTGTCCTGTG TTGACCCCAG GTG.^.ZiTCACA 5340 
AGCTGA.i>.CTT CTGACPJIGAA C.AAGCTATCA TATTCTTTTC AA.TTACAGAA A.^AGTAAGT 540 0 
TAATTGATAG GATTTTTTTT GTTTAJJVAAA AA.TGTTACTA GT-TTTGA.AA.A GGT.^^.TATGT 546 0 



GCACATGGTA AACACTAJVGA AGGTATAAGA GCATAATGCT TTTATACTAC TAAG?_A 



TAAT 552 0 
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GTTTTCTCTA AGTTTTTTTT GGTAGATGCT TTCATCAGAT TA.i^GAA^ATT CCCTGCTATT 55B0 
AGTTGTTGAA GGTTTTto. TCATAAATGA AAGTTGAATA TTATTATCAT ATATTATTAA 5640 
TATATTGTTA TTGAACTATC AAAGCCTTTT CCTAAAACCA TTGAGATGAT CTTATAACCA 5700 
■TTCTCCTTTA ACCTGTTGAC GAGATCATTG GTATTTATAC TATTTCTCTG TTAACCATTC 57 SO 
TTGAGTCTCA GGTTTAAATT CAACTTGGTC ATGGTGTGTC ATCTTTGATC ATTGCTGTCT 5820 
GTGGCTTGCT ACTGTTTTGT TTAGGAITTT TGCACTGATG CTCATCAATG AGACTGGCAT 5880 
GCCATCTTCC TTTGCAGTCC TGATTTTTTT CTGATTTGGA TCATGTGGTT ATGGCCCTCA 5940 
TGGAATGAGT TGGGCATGAT GCCTTTTTTT CATGTCTCTG GATTGATGGG ACACTTTGGA 6000 
TTCTCTCCAG ATGGCCCTCA ATGGTCCCTG CCTCCTCATT GTTAGGCCCC TGGGCAAGCC 6060 
CTTCTCATTT CTGGTAGGCC CAGGAACCTG TGGGGGTTTT GTTTGTTTGT TTGTTTCTTG 612 0 
AGTCGGAGTC TCACTCTGTC ACCCAGGCTG GAGTTGGAGT GCA.^TGGCCC GATCTTGGCT 6180 
CACTGC^J^CC TCCACCTCCC AGATTCAAGC AATTCTCCTG CCTCAGCCTC CTGAGTAGCT 624 0 
GGAATTACAG GCACCCACCG ACACACCCTG CTAATTTTTG TATTTTTAGT AC?.GATGGGG 6300 
TTTCACAATA TTGGCCAAGC TGGTCTCGAA CTCCTGATCT CATGATCTGC CCGGCTTGGC 6360 
CTCCCAAJ.GT GTTGAGATTA CAAGCATGAG CCACCACACC CAGTGAACCT GTGGTTTTTA ' 542 0 
GAAGCTCCCC ATGCATGTGA ATGCTGTGAG CATCCCAGGA TGACAGCCAC TGTGTGTTCA 54 8 0 
GCTGTTC-QAA CTGTGAG.AAA GCACCAGTGG GACCTTCTCC AGCACCTGCC TGCTGAGTTC 654 0 
ATG6AP.GAGG CTTGTTGGGG AGATGATGCC CTGGCTGACT CCTGAAGGAT GGTTAGGAAT 6600 
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GCACCAGATG GAAGCTGGGT TGGACCCACT CTATGCTGAA GAACAGCTTG TGTGGACACA 6660 
AGGAGACACG GATATGTCAT TTTTGTAGAG CCTGAGGAGT GTeCAATCAC ACCATTTGCT 6720 
TAAAACATCA TGCACACTTG GAAAAGTGGA CTGAGACCGA ATGAAGAAGC TAACAGTGGC 6780 
:CAGATCAGAA AGGGTCTTGT GTTACTTCCT AGAGATACTT AGATTTTATC CTGTGGGTGA 6840 
TAGGAGCAGT TGGAGGGACT GAAGACAAGG AAAGAAACAT GTTTCAAGAT CTATGITTTT 6900 
CAAGACGCTT TTCTGGTGGC TGAGTAGGGA ATTCCCTGGA TAAGTCCTGC CCAGGGTCAG 6960 

i 

GCAAAACAAG TTAGGGGGTT ACXGAAATAA GGAGTATGAG AAATGGTGTA GGTTGTGCTG 7020 
ACGTTTTGTA ACACATCTCA TGATGATCTT CATTTCCTTC ACTAATTTCC TGTTTCATTA 7 080 
ATTCCCTTCC ACGTGCTCTT CTGAAATTTG CCTCACATTC TCTGATTTCT CTTTTACCTG 7140 
TTGGTTTCA.T CACCTTTTAC TTTTTGCTTT CCTGGAAACA CAAATGATTC TGATTGTGAC 72 00 ' 
ATGTCAGAAT TATTTGCAAC ATTTGCCTTT CTGCTGAAAC CATGAGTTCA CTG.i.^TACAC 72 60 
AATTTAGTAA AGTGTAGGAT GCACATGTCG TTTTCGTGGT CACAACCAGC TCTGTAGCAT 732 0 - 
TTTATAACTA CACTGCtAGT GTGCTGGGAG GTGTAGAGAG AAATATTTAT CACATGTGTG 73 80 
GCTGACACAA CCTGCCAAGT TATTTTAGGA GCCTCCTTGG AATCCCAGCA AGAATC-CTAC 7440 
CGGCACAATT TGTAATCACA GCATCCTGCT CCATGCCTTG GCTTCATGGC ATAGTCACTT . 7 5 0 0 
CTGCaji.GTCT CTTTCCAGCT GTCTGTTCCC ATGTCTATAA AGTATGAGTT AJLiXCATCCT 7560 
AACACTACTC ATCTTACAAA GTTTTCTTGC TGATGTTAAG AGAGTTGGGA AAG.^-.CTGTA 7 62 0 
TAAACTGTGA AGTGCCATGG AGATGTTAGT GGTTACTTTA TCAA.GAAATA GACACTCTAG 7 68 0 
A.HTGGAGTAG AAAGCCAACA GTTATGATTG AGTCCTCCTC CTCTTCTTCT TTTTATTAAT 774 0 
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TTATAAAGAA AAGAGGTTTA ATTGACTCAC AGTTCCATAT GGCTGGGGAG GCCTCGGGAA 7800 
ACTCTCAGTC ATAGCAGGAG GC;iAAGGGGA AGAAGGCACC TTCTTCAC^A GGCGGCAGGA 7860 
GAGAGAGAGC TCCTGTTCTT TTTTGTCATA AAGTCTACAG AAGTGCTTAT ACTTCAGGAC 7920 
AAGGGCAGGC AGAGAGAAGG AAGGACATTG CTTCACCCCA GCCCTCACTG ACGAGTTTGC 7980 
TAGGGGACCT CACTTTGTCC CAGAGTAGGG CAGAACTCTG GCCACTACCC ATTCAGAAGG 8 040 
CCTGGGCTGC ACfGCTAGTT CCTCACTAAC TCTGTGTGGC CTTGGGCAAG GTTGGGCCTG 8100 
TGTTAACAGA TTATGACCCT GGGCTCTCAA GCTAGAGGAT CTAAATTTGA ATCCTGGCTC 8160 
TGCTAAAGCA ATTAGTGATG TAAACTTTAA TGGGTCAGTT AACCTTCCTG TGGCTTAGTT 8220 
TGCTCATCTG TAAAATAGGG ATCATAiVCAG TATCAATACC ACATGATTGT TGGACAGATT 8280 
GAATCAGTTA ATGCAGGGGA AGTACTTAGC ATGACACGTA TTCACTATCA TTTCCTGGAG 8340 
TAAGAGCTGT GTGTGAGTGG GTGTGAGCAT GTGTGAAACC TTTTCTCTGC AATCTCAGTT 8400 
AAGAAACCA!\ TCCAGAATTT AAAGTTCAGG GCCTAAATGG GTGGTTATCT TCTCCCAGTT 8460 
CCATCCTATC CCACCTTTGC TCTTCCTCCC GCCCAC^GGA GCTGTTGGTC CTTGATTGGG 852 0 
CTGGA?.GACC TGGTGGACCC TAAGTGATCT ATAAGAGGAG AATAGAGAAC AGGGAATGTC 858 0 
TTCAAA^ATC TAGAGGGACA CAGAGGCTGA GAGGCAGGCA GTCCTGCAGG GTCTTCTGAT 854 0 
TGGGACAAGG AG.AACCTTGG TCTTCACAGG CCAATTCTGG TCAGTTTCCC CCATGGACAG 8 70 0 
ATGAGGAAAC AGGCCCAGGA ATATCCAAGG TCTCACACTT CCCATCTGTC i^AGTCTTGTT 8760 
GATTCTGTTG TATTCATGTC TCTCAAAGGG AGATAGAGTT TAGGGAAG^vA AGA^.GGATCA 8820 
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ACTGTGTCTG ATACCACTGG GAGCTTAAGT AAAGGGTTCT TTTACTTCAT AGCATTTATC 8 880 
CCAATTTGTA ATTCAGTATT ATTTGTGTGG CTGTTTGGTG TCTCTTTCTC CTATATGAGT 894 0 
GCTAGCTTCA TAAGGGCAAG GATTTTGATT CTTTAATATT TAGTGCTTGC CACATGCCCT 9 00 0 
GAACACAGCA GGCATACAGG CTAACCAACA TACAGTGGCA TGAAAGTCAT GAAAGTGAGA 9060 
CACCTACCTC CTCCAGTGCC AAGAGAGCAT AACCATGCAC CTGTCACTCT CCTCAACACC 912 0 
ACCCCCAZ^GC ATGAGGCCCA AAAGCATTAG CTAATCCCCT CCTCCAGCCA CTAAAACTTA 9180 
AAGGCCAGGT GTGGTGGCTC CCATCTGAAA TCCCAGAACT TCAGGAGACA GCAGCAGGAG 924 0 
GATCACTTGA GGCCAGGAGT TTGAGATCAG CCTGGGCAAC ATAGCTAGGT CCCATCTGTA 9300 
CTAAAAATTA GCTGGGCGTT GTTGCATGCC TGTAGTCCCA GCTACTAAGG AGGCTGAGGT 93 60 
GGGAGGATCA CTTGAGCCCA GGAGGTGGAA ACAACAGTAA GCTATAATCA CAGCACTGAA 9420 
CTCTAGCCTG GGCAACAGAG TGACACCCTG CCTCAAAACA ATTTTAAAAA TAAATAAGAG 9480 
CAAAACTTAG ATACCACGTG GTCACCCCAA CATGCAAAAT CAAGTTTTCC CCTACTGAGA 9540 
AGAATGGGGA CTTGACAGCT GAGTTACAGA GAGATAATCT TCTTCTTCTT TTTTTTTTTT 9600 
TGGTTTACAT CCTCAAGATC ATGACTTGTG AAATTTGAAT CG.AATACACA TGT.^ATTCCA 9660 
GAGCAJVTGTT GCCTCCGCAT ACCATCAGCA ATTCACTTGG CTACTGGAAG TCAGGATAAg' 9720 
CTTCCCAG.A,^. GAGAGGTACC ACTTGGGCTA GCAATATAAA AGGATGAAAA TATCAGAGTG 9780 
ATGGTGTTCT TTACAACGTT GAGTCCCTGG ACAGCCTGTC CACTGATGCT GATATCTGAG 9 840 
CCTAATGCTT CTCTGAATGT TGAGATTGAA CTTTGATCCA ATGAAACTAG A.ACGAGAAAG 990 0 
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AAGATA.:^GTC TTTCATTGTT GATAAGGACA TTATGTTTCT CATACTTGTA TGATTATTTT 9950 
TCCTTAGCTG TACTATAAT7 ATCTGCTTAT TTGTCTCTGC TCTATGTGCT TAGGGTACAA 10020 
AGTTGACCAA GACCAACTTT GGTTGGAAGC ATAGTACTAA GAGCACAGTA CTGAGAGCAC 10080 
AGTATTGAGA GCACAGCTTT AAAAAACATG ATGAAGGCTT TAATACAGGA AATGAGCAGG 10140 
GGAGAGGCAT GTGGTG6TTG GATGTATCTT CCTTGACACA GTCAGTGCAG CTCTCAGTAG 10200 
TCAAGTCCCT ACATGTTAGA AGATGTTACC TTCTGTGGAA TTAAGTGGCA GAACTTGCCT 10260 
TCAATTATTT TCCTTTGCAG AACAACACCA ACTGCATTAG TTAGGACACA GTGCTGGCTG 10320 
CATTTAAGTC CCAAGCGATG ATTAGTCTCT CACTGTTGGT ATAGATTCAA ACCAATCAGA 10380 
CCACCTCCTA AAGTTTGTAG GGCAGGT.AAA TCCTCATCTT AGAATAAAAA TCATCTTACC 10440- 
AAGTATGTGT TTTAGAGGCA AGAAG.^y^AAC ATATTTGTTT CTGTAAGAGT TTTGTTTAAA lOSOO 
■ AAAAATATAA GAAAGGCTCT CGGTTTAGGT- GAGGTAATGA AGTTGTTGAT AGTTATCAGA 10560 
TGACACTGGA ATCTTTACTT CTCTGAACGT GTTCTGTGCA . TCTCTCAGTG TGGG.Hji.CATA 10620 
GAGAGGGAGA TCCTCCAGCA ATGCCACTGA TATGGTCAGA AACTGa^TCT TTCTTTCTCC 10680 
CTGCTGAGAT GAGATGGAGT CCTTTGTTCT AGAAGACCCA TGGTGGTGCC GCTGGGAGTA 1074 0 
ACCCTTGAGA CAGGAACACA AATCCCA^^.CC AATTTGTGGT TGCAGCCTTG AGTCTCACTA 10800 
TTTCCCATAG TGATGCGTAG CAGGGi^ATGG CAGGTGCACC AGAGCAGGAG AGGACCT?AT 10860 
ATCTCCCTTC CTGTTAGCTT TTTAT.AAAGT TTTATTGTGA TCAGTAGCAG TTGGG.^GCT . 10 92 0 
ACTTGCAGTC ACTGAGCCTC AGTTTCTACA TCTGTAAACT GGGGATAGTA GCATGGCCCC 10980 
TACTT.i^.TGT GCTCAGCA.AA GCCACTQf._AA GGAGACAGAA ATGTATCTAJi ATTACCCTGG 110'40 
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ACTTTTATCC TACCTCTCTT GGGGATTGTC ACCACCTTCC CATGTTTGTC CTTTTTGGTT 11100 
TGATGCTTGC TGTCACTTCT TTCCTTAGGT GCCTCTCTGT ACGGCTCTTT TATCCCAGGG" 11160 
ATTCCAGAGT TACAGCACAT GCATACCACC ATCCAAGCAT GTTTATTTGT CTCCTGCTTC 112 2 0 
ACTAGGCTGT CCCCAAGGAA CATGTGGCTC CCGGCACACA CCTGGCACAA CACTGCACAT 112 8 0 
GACATTCACC CACTTGGCCT TGAATCTGAC AAGGAATCTG GCATGATGTT CACCCACTCA 11340 
GGCCAGGTGC CG^GCAGCCC TGGAGGCTTA GGGGCCAGAG GGATGGGAAA AGGTGTCTTT 11400 
CTGGGGTGAG TATCAGTTTC TGCAGGAGGG CTGAATGTGA GAAAGAATAA AGAGAGAAGG 114 60. 
AAGCGAACAA GCACAGCTTA AACATCGCCT ATTTCTATTG AGTTTTAAGA ACGCTGTGAT 1152 0 
TTTGTTTGTC ATGCAATCCA TTCATCAGGC CAGGCAGACA CAGAACTTGG GTGTGAGTGA 115 8 0 
CGATAATGAG CTGATATAAT TTTCACACCC TCATCACTGA GATCTCTCCC ATCAGGAATG 11640 
GGTCAGGGAG CTCACAGGTG GCAGCAACTG CTATTACAGG CCTCATCTCT ACCAGCTCCT 117 0 0 
GGGGCCTGCC CTCCTCCCAT TAGAAAATCC TCCACTTGTC AAAAAGGAAG CCATTTGCTT 11760 
TG^JVCTCCAA TTCCACCCCC AAGAGGCTGG GACCATCTTA CTGGAGTCCT TGATGCTGTG 118 2 0 
TGACCTGCAG TGACCACTGC CCCATCATTG CTGGCTGAGG TGGTTGGGGT CCATCTGGCT 118 80 
ATCTGGGCAG CTGTTCTCTT CTCTCCTTTC TCTCCTGTTT CCAGACATGC AGTATTTCCA 11940 
GAGAGAii.GGG GCCACTCTTT GGCAAAGAAC CTGTCTAACT TGCTATCTAT GGCAGGACCT 12 0 00 
TTGAAGGGTT CACAGGAAGC AGCACAAATT GATACTATTC CACCAAGCCA TCAGCTCCAT 12 060 
CTCATCCATG CCCTGTCTCT CCTTTAGGGG TCCCCTTGCC AACAGAATCA CAGAC-QACCA 12120 
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GCCTGAAAGT GCAGAGACAG CAGCTGAGGC ACAGCCAAGA GCTCTGGCTG TATTAATGAC 12180 
CTAAGAAGTC ACCAGAAAGT CAGAAGGGAT GACATGCAGA GGCCCAGCAA TCTCAGCTAA 12240 
GTCAACTCCA CCAGCCTTTC TAGTTGCCCA CTGTGTGTAC AGCACCCTGG TAGGGACCAG 12300 
AGCCATGACA GGGAATAAGA CTAGACTATG CCCTTGAGGA GCTCACCTCT GTTCAGGGAA 12 360 
ACAGGCGTGG AAACACAATG GTGGTAAAGA GGAAAGAGGA CAATAGGATT GCATGAAGGG 12420 
GATGGAAGGT GCCCAGGGGA GGAAATGGTT ACATCTGTGT GAGGAGTTTG GTGAGGAAAG 124 80 
ACTCTAi^.GAG AAGGCTCTGT CTGTCTGGGT TTGGAAGGAT GTGTAGGAGT CTTCTAGGGG 12 540 
GCACAGGCAC ACTCCAGGCA TAGGTAAAGA TCTGTAGGTG TGGCTTGTTG GGATGAATTT 12 600 
CAAGTATTTT GGAATGAGGA CAGCCATAGA GACAAGGGCA AGAGAGAGGC GATTTAATAG 12 660 
ATTTTATGCC AATGGCTCCA CTTGAGTTTC TGATAAGAAC C.CAGAACCCT TGGACTCCCC 12 720 
AGTAACATTG ATTGAGTTGT TTATGATACC TCATAGAATA TGAACTCAAA GGAGGTCAGT 12 78 0 
GAGTGGTGTG TGTGTGATTC TTTGCCAACT TCCAAGGTGG AGAAGCCTCT TCCAACTGCA 12 84 0 
GGCAGAGCAC AGGTGGCCCT GCTACTGGCT GCAGCTCCAG CCCTGCCTCC TTCTCTAGCA 12 900 
TATAAACAAT CCAACAGCCT CACTGAATCA CTGCTGTGCA GGGCAGGAAA GCTCCATGCA 12 960 
CATAGCCCAG CAAAGAGCAA CACAGAGCTG AAAGGAJ^GAC TCAGAGGAGA GhGATAAGTA 13 020 
AGGAAAGTAG TGATG 13 03 5 
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GGTACCTGGT TATCTATTGG GACTGGTTGG ACAAGAGGGT GCAGCCCACG GAGGGTGAGC 60 
C;iAGCAGGGT GGGGCGTCGC CTCACCTGGG AAGCACAAGG GGTCGTGGAA TTTTCTCCCC 120 
TACCCAAGGA AAGCCATAAG GGACTGAGCC TGAGGAACTG TGCACTCTGG' CCCAGATACT 180 
GCACTTTTCC CATGGTCTTT GCAACCCGCA GACCAGGAGA TTCCCTCCGG TGCCTATGCC 240 
ACCAGGGCCC TGGGTTTCAA GCACAAAACT GGGCAGCCAT TTGGGCAGAC ACCGAACTAG 3 00 
CTGCAGGAGT tTTTTTTTTT TTTTTCCATA CCCCATTGGC ACCTGGAACG CCAGTGAGAC 360 
AGAACCGTTC ACTCCCCTGG AAAGGGGGCT GAAACCAGGG ATCCAAGTGG TCTGGCTCGG 420 
TGGGCCCCAC CCCCATGGAG CCCAGCAAAC AAAGATTCAC TTGGCTTGAA ATTCTTGCTG 48 0 
CCAGCACAGC AGCAGTCTGA GATTGACCTG GGACCCTCGA ACTTGGTTGG GTGCTGTGGG 54 0 
GGGGCATCTT CCATTGCTGA GGCTTGAGTA GGTGGTTTTA CCTTCGCGGT GTAAACAAAG 60 0 
CTGCTGGGAA GTTTGAACTG GGTGGAGCTC ACCACAGCTC AGTAAGGCCA CTGTGGCCAG 660 
ACTGCCTCTC TGGATTTCTC CTCTCTGGGA AGGATATCTC TGAAAAAAAG GCAGCAGCCC 720 
CAGTCAGGGA CTTATAGATG AAACCCCCAT CTCCCTGGGA CAGAGCCCCT CGGGGAAGAG 780 
GTGGCTTCCA CCATTGTGGA AGACTGTGTG GCAATTCCTC ACGGATTTAG AACTAGAGAT 84 0 
ACCATTTGAC CCAGCAATCC CATTACTGGG TGTATACCCA TAGGATTATA AATCATTCTA 90 0 ' 
CTATAAAGAC ACATGCACAC TTATGTTTAT TGTAACACTA TTTACAATAG CAATGACCTG 960 
GAACCAATCC AJLAAGCCCAT C.^iATGATAGA CTGAATAAAG AAaATGTGGC ACATATACAC 1020 
TGTGGAATAC TATGCAGCCA T.^AAAAAGGA TGAGTTCATG TCCTTTGCAG AGACATGGAT 10 8 0 
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GAAGCTGGAA ACCATCATTC TCAGCAAACT AGCACA^z^T^^A CAGAAAACCA AACACTGCAT 1140 
GTTGTCACTC ATAAGTGGGA GTTAAACAAT GAGAACACAT GGACACAGGG AGGGGAACGT 1200 
CACACACTGG GGCATGTCGG GGAGTGGGGG CCTACGGGAG GGATAGCATT AGCAGAAATA 1260 
CCTAATGTAG GTGACGGGTT GATGGGTGCA GCAAACCACC ATGGCACATA TACACCTATG 132 0 
TAATAAAACT GCACGTTCTG CACATGTACC CCAGAACTTA AAGTATAATT AATAATAATA 13 8 0 
ATAATTTCTG GGCATGTAAG TAGCTGTCTT TCAGGTTCTA CTTTGATACA TATTCTGAGA 1440 

i 

GAATTAAACC TGTCAAAGAA ACCTTGACTT TCAATGGCAG GCACTGGAAT TGACCCTAAT 1500 
AATGTGTTTT GGGGTAAGCC TACTCATATT CTaAACCTGT CTGCAGTAGT CGTTAGJ^JVTC 1560 
TGAACTTCCT GAAGTTCATG TGCAAAGTTG AGTTAATTGT TTAATATTCA ACAAGGATTA 162 0 
TGCCAGTAAG ATGGTAGGAA AATATTAGAT ATGTGTCATC ACTGCTGGTA TTATTTAAAC 168 0 
TGCAACATAT TTTAGCTGGC TGCTGATCTC AGCCACCATG CCTGCATTa?T ATCTCTGTCT 174 0 
CGTGGTCTGC AACCTTGGAA GCTTTGAACT TAGCTCATAG AJVTCCXGGGC ATCAAGAACA 18 00 
TGTGGTTCTA ATGGCTAGAT AGGGAATGAG AGTAAAAGGA TTTTGCCCAC GGTCACGTGA 1860 
GTAAACAACA GATTTGGAGG GGTCTGGACT ACTGTGATGA CTTCATTCTG ACAATATGTT 192 0 
CCAGTTGTCC TTXCATTTCC TCCTAATCAC ATGTTTGGTC TGATTTGGCT GTTTCCCACC 198 0 
TTCCAATTCC TGCCTTCTCC AATGCTCCCT TCCGTAGGTC ACTCTGTGGC TCAGAGACCC 204 0 
TGCTTAGCAA GCGCCCAACC TTTCAATTAT TTGTTCAGTA AAACTTGAAC TCATGTCTCC 210 0 
CCTTCTTGAT AAAAAGAAAA TACGTTATGT AATGTCGGGT TACTCTATAA CTCTTGTCCT 2160 
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GTCTCTCGGC AACTAGTGAA CTAACTGTTT TCATATTGAG CAAACGTTTA TGGAP.GGACT 2220 
GCCAAGAGTC AGGTACTAGG CTTGGTAATA TTCCCCGTTC TCTCTAGTCA AAGCaLA.CAC 2280 
CAGCCAGACT TGCAGATCTA GGTCCCAAGC CCACTGCAGA TCACAGGCCA GGGTCTGGTC 2340 
•TCCTCTGAGC TCCTTTGGGA GGGAAAGACA GAATTATTAA CACCCATTTT GTAGATTAGG 2400 
CAACTGAGGC TGAGGAAGTT TAAATAACTC AGACAGGGCC TGCACGTCAG TCATATTCCA 2460 
A 2461 
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GGTACCTGGTTATCTATTGGGACTGGTTGGACAAGAGGGTGCAGCCCACG 

GAGGGTGAGCCAAGCAGGGTGGGGCGTCGCCTCACCTGGGAAGCACAAG 

GGGTCGTGGAATTTrCTCCCCTACCCAAGGAAAGCCATAAGGGACTGAGC 

CTGAGGAACTGTGCACTCTGGCCCAGATACTGCACTTTTCCCATGGTCTTT 

GCAACCCGCAGACCAGGAGATTCCCTCCGGTGCCTATGCCACCAGGGCCC 

TGGGTTTCAAGCACAAAACTGGGCAGCCATTTGGGCAGACACCG\ACTAG 

CTGCAGGAGTTTTTTTTTTTTTTTTCCATACCCCATTGGCACCTGG\ACGCC 

AGTGAGACAGAACCGTTCACTCCCCTGGAAAGGGGGCTGAAACCAGGGA 

TCCAAGTGGTCTGGCTCGGTGGGCCCCACCCCCATGGAGCCCAGCAAACA 

AAGATTCACTTGGCTTGAAATTCTTGCTGCCAGCACAGCAGCAGTCTGAG 

ATTGACCTGGGACCCTCGAACTTGGTTGGGTGCTGTGGGGGGGC\TCTTCC 

ATTGCTGAGGCTTGAGTAGGTGGTTTTACCTTCGCGGTGTAAAC^AAGCTG 

CTGGGAAGTTTGAACTGGGTGGAGCTCACCACAGCTCAGTAAGGCCACrG 

TGGCCAGACTGCCTCTCTGGATTTCTCCTCTCTGGGAAGGATATCTCTGAA 

AAAAAGGCAGCAGCCCCAGTCAGGGACTTATAGATGAAACCCCCATCrCC 

CTGGGACAGAGCCCCTCGGGGAAGAGGTGGCTTCCACCATTGTGGAAGAC 

TGTGTGGCAATTCCTCACGGATTTAGAACTAGAGATACCATTTGACCCAGC 

AATCCC ATTACTGGGTGTAT ACCC AT AGGATTATAAATCATTCT ACTATAA ' 

AGACACATGCACACTTATGTTTATTGTAACACTATTTACAATAGCAATGAC 

CTGGAACCAATCCAAAAGCCCATCAATGATAGACTGAATAAAG^AAATGT 

GGCACATATACACTGTGGAATACTATGCAGCCATAAAAAAGGATGAGTTC- 

ATGTCCTTTGCAGAGACATGGATGAAGCTGGAAACCATCATTCTCAGCAA 

ACTAGCACAATAACAGAAAACCAAACACTGCATGTTGTCACTCATAAGTG 

GGAGTTAAACAATGAGAACACATGGACACAGGGAGGGGAACGTCACACA 

CTGGGGCATGTCGGGGAGTGGGGGCCTACGGGAGGGATAGCATTAGCAG 

AAATACCTAATGTAGGTGACGGGTTGATGGGTGCAGCAAACC^rCATGGC 

ACATATACACCTATGTA.'\TAAAACTGCACGTTCTGCACATGTACCCCAGA 

ACTTAAAGTATAATTAATAATAATAATAATTTCTGGGCATGTAAGTAGCTG 

TCTTTCAGGTTCTACTTTGATACATATrCTGAGAGAATTAAACCTGTCAAA 

GAAACCTTGACTTTCAATGGCAGGCACTGGAATTGACCCTAATAATGTGTT 

TTGGGGTAAGCCTACTCATATTCTCAACCTGTCTGCAGTAGTCGTTAGAAT 

CTGAACTTCCTGAAGTTCATGTGCAAAGTTGAGTTAATTGTTTAATAITCA 

ACAAGGATTATGCCAGTAAGATGGTAGGAAAATATTAGATATGTGTCATC 

ACTGCTGGTATTATTTAAACTGCAACATATTTTAGCTGGCTGCTGATCTCA 

GCCACCATGCCTGCATTTTATCTCTGTCTCGTGGTCTGCAACCTTGGAAGC 

rrTGAACTTAGCTCATAGAATCCTGGGCATCAAG.AACATGTGGTTCTAATG 

GCrAGATAGGGAATGAGAGTA/\AAGGATTTTGCCCACGGTCACGTGAGTA 

AACAACAGATTTGGAGGGGTCTGGACTACTGTGATGACTTCATTCTGACA 

ATATGTTCCAGTTGTCCTTTCATTTCCTCCTAATCACATGTTTGGTCTGATT 

TGGCTGTTTCCCACCTTCCAATTCCTGCCTTCTCCAATGCTCCCTTCCGTAG 

GTCACTCTGTGGCTCAGAGACCCTGCTTAGCAAGCGCCCAACCTTTCAATT 

ATTTGTTCAGTAAAACTTGAACTCATG'rCTCCCCTTCTTGATA.A_AAAGAAA 

ATACGTrATGTAATGTCGGGTTACTCTATA.ACTCTTGTCCTGTCTCTCGGC 

AACTAGTGAACTAACTGTTTTCATArrGAGCAAACGTTTATGGAAGGACT " 

GCCAAGAGTCAGGTACTAGGCTTGGTAATATTCCCCGTTCTCTCTAGTCAA 

AGCCAACACCAGCCAGACTTGCAGATCTAGGTCCCAAGCCCACTGCAGAT . 

CACAGGCCAGGGTCTGGTCTCCTCTGAGCTCCTTTGGGAGGGAAAGACAG 
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AA-ITATTAACACCCATTTTGTAGATTAGGCAACTGAGGCTG 4GG \ AGTTT 

AAATAACTCAGACAGGGCCTGCACGTCAGTCATATTCCAAGGATCCCTAC 

TCACTGTCTTCTCTCTACAGAACGAGATGTCTCTGGAGTCCAT ^G --VAAGCC 

CAGGAGCCTGGCTGGGCACGGTGGCTCCTGCCTGTAATCCCAGCACTTTG 

GGAGGCCGAGGCAGGCAGATCACCTGAGCTCAGGAGTTCA.'\GACCAGCC 

TGGGC AACATGGCAA AACCCCATCTCTACTAAAAATACA A A.-\ A \TTAGCT 

GGGCGTGQTGGTGCATGCCTCTAATCCCAGCTACTTGGGAGGCTGAGGCA 

CAAGAATTGCTTGAGCCCAGGAGGCAGCAGTTGCAGTGAGCTGAGATTGT 

GCCAGTGCACTCCAGCCTGGGCAACAGAGCAAGATTCCATTTC,\AAAACA 

AAAACAAACACAAACAAACAAACAAAAATAGAAAGCCCAGGn^CCACCr 

GCGTCAGGTTCCCAGCCACACCTTTTTCTTGTCCTCCrCTGTCTCTGGCATC 

TTCTCACAGGTTCCTAATTGrrTGTGGTTGCACAAATTCAAA-ATCCCAGAA 

AAATTACCACTTCACACCCACTCAGATGGCTATTTTTTTTTTGA^\GGAAGA 

TAACAAGTGTTGACAAGAACATGGAGAAATTGGAATTCrCACCCATTGCr 

GGTGAGAATGTAATACGGTGCTGCTGCTATGGAAAACAGCTTGGAGTTTC 

CTCAAA.\AGTTCAACAGAATTTCAATGTGACCCAGCAATTCCCCTCTAAGT 

TATAGATCTGAGAGGATTAAAAACAGTTACTAAAATACACGGACTCACAT 

ATTTCTAACAGTCCAATTCACAAGGGCCAAAAGGTG'CTAATAGCCCACAT 

GTCCATCGATGGATGGATAAATAA.'VTTGTGGTCTATCCATACAATGGAAT 

ATTATTCGGCCATAAATGGAATGAAGTACTGACGCATGCTACAGAATGGA 

TGA.\CCGCAAAAAAAATGGATGAACACATGCTACAGAATGGATAGCCTC 

ACTTTACTATGAAGTGAAGGCCAGAAACGAAGTCCATATArrOCATCATA 

CAAAATATCCAGAAGAGGGAAGCCCACAGAGACAGAATGTGCAATGGTG 

GATGCCAGGGTCTGGGGAGAGGGGAGAGTGGGGAGAAACTGCTCAACTG 

GTACAGGCTTTATTTTGGAATGATGGGAACATTTTGCAACTAGATAGAGG 

TAGTGATTGCAGAACACAGAATGTACTGAATTCCACTGATTTTTTTCACCT- 

TAAAATGGTTAATTTTCAGTCCTGAGATTGGATAATCATAAA ^\ A AATGGTT 

AATTTTATGTTATGTGAATTTCATCCCTATACATATTTTAAACCTCAGAAA 

TATACACTAGCAGGCATGGAACAGGTCACTGTGGTGCCTGCCAAGCCCGG 

TGATGTTATCTGGGGTCCCCGGCCAGCCTTAAGCCTCTTGCTGACCGGTGG 

AGGGCAGAACCTTTGCCCTAAAAGTATAATATCCACATGCTGGCATGATT 

CCTGGCCAGATGGCTTCTTTATTAGCAGTAATTGAAACTGCCTCGATACAG 

ACACTGTACCTTGCAACC.AAAAAATGACTCAACAATGATAATAAGGGTtA 

AGCTGGGCCTTTCTCTCTTTGCCAGTTAAATTATATTTATTATAGCTTGACA 

TGAAAAACAAAGCAACTCCAACAGGTATCACA.AGGGCAAAGG\CATGAA 

CATTTTATCAAAGAAGA.'AATGCAGCTGTCAAAAATACAGAA AT ATTCAAC 

CTTGTTCATAATAAAGTGGCTGGGCTCAGTGGTTCATGCCTGT A ATCCCAG 

TGCTTTGCAAGGCTGAGACAGGAGGATCATTTGAAGCCAGAAGTTCAAGA 

CCATCCTAGGCAAGTCAGTTCAATACCAGAC^ITCATGTCTAC - A A ACATC 

AAAAAATTAGCCAGGCATGGTGATGCATGCCTGTTGTCCCAGCTACTCAG 

GAGGCTGAGGCAGGAG.VATTGCTTGAGCCTGGGAGGCTGCGGTGGCGGT 

GAGCCATGATTGTGCCATTGTACTCCAGCCTGGGCAATGCAGCAAGACTG 

1 C FA A ATAACAAAAATAATAGTA AAG.\AAAGGATTGGGATGCCATTTACT 

TGCGTATTCAATACACAGAGTTAAAAGTAATTTCTACGTTTICTATTITTTT 

ATTACTAAAA.\AAGCTGGACCATTCTCACAGCCTGAy\ATGCTTr:TCACTTT'' 

CCCTTCTTCTGTCCAAACACTTCTCTATGATAATGCAAACAGTCACTCCm 

AGGAAGACTTCACCCCAGGTAGTTCCAGATCCCCTTATCTCTGCCTTCCCA 

GAACTCCTGGTGTCTCTCCAGTTCCCTCCGTGTGGTGAAGTACCCTACCTA 
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GGGTTTCAGTATGGCTCTGTCTGCAAAGGTCTTGTTCACACCTTCCCTTAT 

GGTTCTGTTGCCCTGTGTTGTGTCATAGCACAGGGCACAGTGG^GAACCC 

ATTCACACTGATAGAGAGGGCCCCATGGTCCTGGAGATAACCATGTAACC 

GATCAGAATAAGGCATTGAGGGCTGGGTGTCAGGCGTGGGCTGCACTTGG 

GTGGGCAGGTCCCCTGGAAAGTCACTGGGTTTGGCAAGCTTCCTAGTAAC 

ATGTCTCTCTGGGGTCCCCCTTGGAACTTCATGCAAAAATGCTGGTTGCTG 

GTTTATTCTAGAGAGATGGTTCATTCCTTTCATTTGATrATCA.\.\GAAACr 

CATGTCCCAATTAAAGGTCATAAAGCCCAGTTTGTAAACTGAGATGATCr 

CAGCTGAATGAACTTGCTGACCCTCTGCTTTCCTCCAGCCTCTCGGTGCCC 

TTGAAATCATGTCGGTTCAAGCAGCCTCATGAGGCATTACAAAGTTTAATT 

ATTTCAGTGATTATTAAACCTTGTCCTGTGTTGACCCCAGGTG^ATCACAA 

GCTGAACTTCTGACAAGAACAAGCTATCATATTCTTTTCAATTACAGAAAA 

AAGTAAGTTAATTGATAGGATTTTTTTTGTTTAAAAAAAATGTTACTAGTT 

TTGAAAAGG^-AATATGTGCACATGGTAAACACTAAGAAGGTATAAGAGCA 

TAATGCTTTTATACTACTAAGAATAATGTTTTCTCTAAGTTTTTrTTGGTAG 

ATGCTTTCATCAGATTAAGAAAATTCCCTGCTATTAGTTGTTG.^AGGTnr 

TATATCATAAATGAAAGTTGAATATTATTATCATATATTATTAATATATTG 

TTATTGAACTATCAAAGCCTTTTCCTAAAACCATTGAGATGATC1TATAAC 

CATTCTCCTTTAACCTGTTGACGAGATCATTGGTATTTATACT4TTTCTCTG 

ttaaccattcttgagtctcaggtttaaattcaacttggtcatggtgtgtca: 

TCTTTGATCATTGCTGTCTGTGGCTTGCTACTGTTTTGTTTAGGATTITTGC ' 

ACTGATGCTCATCAATGAGACIGGCATGCCATCTTCCTTTGCAGTCCTGAT 

TTTTTTCTGATTTGGATCATGTGGTTATGGCCCTCATGGAATGAGTTGGGC 

ATGATGCCTTTTTTTCATGTCTCTGGATTGATGGGACACTTTGGATTCTCTC 

CAGATGGCCCTCAATGGTCCCTGCCTCCTCATTGTTAGGCCCCTGGGCAAG 

cccttctcatttctggtaggcccaggaacctgtgggggttttgtttgtttgt 

TTGTTTCTTGAGTCGGAGTCTCACTCTGTCACCCAGGCTGGAGTTGGAGTG' 

caatggcccgatcttggctcactgcaacctccacctcccaga-itcaagcaa 

TTCTCCTGCCTCAGCCTCCTGAGTAGCTGGAATTACAGGCACCCACCGACA 

caccctgctaatttttgtatttttagtacagatggggtttcacaatattgg 

CCAAGCTGGTCTCGAACTCCTGATCTCATGATCTGCCCGGCITGGCCTCCC 

aaagtgttgagattacaagcatgaqccaccacacccagtg^vacctgtggt 

TTTTAGAAGCTCCCCATGCATGTGAATGCTGTGAGCATCCCAGGATGACA 

gccactgtgtgttcagctgttggaactgtgagaaagcaccagtgggacct 

TCTCCAGCACCTGCCTGCTGAGTTCATGGAAGAGGCTTGTTGGGGAGATG 

atgccctggctgactcctgaaggatggttaggaatgcaccagatggaagc 

TGGGTTGGACCCACTCTATGCTGAAGAACAGCTTGTGTGGACACAAGGAG 
ACACGGATATGTCATTTTTGTAGAGCCTGAGGAGTGTCCAATC'\CACCATT 
TGCTTAAAACATCATGCACACTTGGAAAAGTGGACTGAGACCGAATGAAG 

aagctaacagtggccagatcagaaagggtcttgtgttacttcctagagat 

ACTTAGATTTTATCCTGTGGGTGATAGGAGCAGTTGGAGGG^CTGAAGAC 
AAGGA.A-AGAAACATGTTTCAAGATCTATGTTTTTCAAGACGCTTTTCTGGT 
GGCTGAGTAGGGAATTCCCTGGATAAGTCCTGCCCAGGGTCAGGCAAAAC 

aagttagggggttactgaaataaggagtatgagaaatggtgtaggttgtg 

CTGACGTTTTGTAACACATCTCATGATGATCTTCATTTCCTTCACT/VATTTC" 

ctgtttcattaattcccttccacgtgctcttctg.aaatttgcctcacattct 

CTGATTTCTCT'rTTACCTGTTGGTTTCATCACCTTTTACTTTTTGCTTTCCTG 
GAAACACAAATGyVTTCTGATTGTGACATGTCAGAATTATTTnr\.i\CATTTG 
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CCTTTCTGCTGAAACCATGAGTTCACTGAATACACAATTTAGTAAAGTGTA 
GGATGCACATGTCGTTTTCGTGGTCACAACCAGCTCTGTAGCATTTTATAA 
CTACACTGGCAGTGTGCTGGGAGGTGTAGAGAGAAATATTTATCACATGT 
GTGGCTGACACAACCTGCCAAGTTATTTTAGGAGCCTCCrTGG.'^L^TCCCAG 
CAAGAATGCTACCGGCACAATTTGTAATCACAGCATCCTGCrCCATGCCTT 
GGCTTCATGGCATAGTCACTTCTGCAAGTCTCTTTCCAGCTGTCTGTTCCC 
ATGTCTATAAAGTATGAGTTAAATCATCCTAACACTACTCATCTTACAAAG 
TTTTCTTGCTGATGTTAAGAGAGTTGGGAAAGAACTGTATAAACTGTGAA 
GTGCCATGGAGATGTTAGTGGTTACTTTATCAAGAAATAGACACTCTAGA 
ATGGAGTAGAAAGCCAACAGlTATGATTGAGTCCTCCTCCTCTTCTTCnT 



TTATrAATTrATAAAGAAAAGAGGTTTAATTGACTCACAGTTCCATATGGC 

TGGGGAGGCCTCGGGAAACTCTCAGTCATAGCAGGAGGCAAAGGGGAAG 

AAGGCACCTTCTTCACAAGGCGGCAGGAGAGAGAGAGCTCCTGTTCTTTT 

TTGTCATAAAGTCTACAGAAGTGCTTATACTTCAGGACAAGGGCAGGCAG 

AGAGAAGGAAGGACATTGCTTCACCCCAGCCCTCACTGACGAGTTTGCTA 

GGGGACCTCACTTTGTCCCAGAGTAGGGCAGAACTCTGGCCACTACCCAT 
TCAGAAGnrrTnnnrTrjr A rTnr-r a nrrmr'Try A A A o-T-^T-^rr^^rr.^ ^^^.^^ 



^v^v^vjiT.N^v^i.v^i-iv.j. i ivji<^»^\^rt.vjr/^\oi/\ooo<^AUAAuicruuL:CACTACCCAT 
TCAGAAGGCCTGGGCTGCACTGCTAGTTCCTCACTAACTCTGTGTGGCCTT 
GGGCAAGGTTGGGCCTGTGTTAACAGATTATGACCCTGGGCTCTCAAGCT 
AGAGGATCTAAATTTGAATCCTGGCTCTGCTAAAGCAATTAGTGATGTAA 
ACTTTAATGGGTCAGTTAACCTTCCTGTGGCTTAGnTGCTCATCTGTAAA 
ATAGGGATCATAACAGTATCAATACCACATGATTGTTGGACAGATTGAAT 
CAGTTAATGCAGGGGAAGTACTTAGCATGACACGTATTCACTATCATTTCC 
TGGAGTAAGAGCTGTGTGTGAGTGGGTGTGAGCATGTGTGAAACCTTTTC 
TCTGCAATCTCAGTTAAGAAACCAATCCAGAATTTAAAGTTCAGGGCCTA 
AATGGGTGGTTATCTTCTCCCAGTTCCATCCTATCCCACCTTTGCTCTTCCT 
CCCGCCCACAGGAGCTGTTGGTCCTTGATTGGGCTGGAAGACCTGGTGGA 
CCCTAAGTGATCTATAAGAGGAGAATAGAGAACAGGGAATGTCTTCAAAA 
ATCTAGAGGGACACAGAGGCTGAGAGGCAGGCAGTCCTGCAGGGTCTTCT 
GATTGGGACAAGGAGAACCTTGGTCTTCACAGGCCAATTCTGGTCAGTTT 
CCCCCATGGACAGATGAGGAAACAGGCCCAGGAATATCCAAGGTCTCACA 
CTTCCCATCTGTCA.'ilGTCTTGlTGATTCTGTTGTATTCATGTCTCTCAAAGG 
GAGATAGAGTTTAGGGAAGAAAGAAGGATCAACTGTGTCTGATACCACTG 
GGAGCTTAAGTAAAGGGTTCTTTTACTTCATAGCATTTATCCCAATTTGTA 
ATTCAGTATTATTTGTGTGGCTGTTTGGTGTCTCTTTCTCCTATATGAGTGC 
TAGCTTCATAAGGGCAAGGATTTTGATTCTTTAATATITAGTGCTTGCCAC 
ATGCCCTGAACACAGCAGGCATACAGGCTAACCAACATACAGTGGCATGA 
AAGTCATGAAAGTGAGACACCTACCTCCTCCAGTGCCAAGAGAGCATAAC 
CATGCACCTGTCACTCTCCTCAACACCACCCCCAAGCATGAGGCCCAAAA 
GCATTAGCTAATCCCCTCCTCCAGCCACTAAAACTTAAAGGCCAGGTGTG 
GTGGCTCCCATCTGAAATCCCAGAACTTCAGGAGACAGCAGCAGGAGGAT 
CACTTGAGGCCAGGAGTTTGAGATCAGCCTGGGCAACATAGCTAGGTCCC 
ATCTGTACTAAAAATTAGCTGGGCGTTGTTGCATGCCTGTAGTCCCAGCTA 
CTAAGGAGGCTGAGGTGGGAGGATCACTTGAGCCCAGGAGGTGGAAACA 
ACAGTAAGCTATAATCACAGCACTGAACTCTAGCCTGGGCAACAGAGTGA 
CACCCTGCCTCAAAACAATTTTAAAAATA/\ATAAGAGCAAAACTTAGATA.' 
CCACGTGGTCACCCCAACATGCAAAATCAAGTTTTCCCCTACTG\GAAGA 
A i GGGGACrTGACAGCTGAGTTACAGAGAGATAATClTCTTCTTCTTTTrr 
TTTTTTTGGTTTACATCCTCAAGArCATGACTTGTGA,4.ATTTGAATCGAAT 
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ACACATGTAATTCCAGAGCAATGTTGCCTCCGCATACCATCAGCA.\TTCAC 

TTGGCTACTGGAAGTCAGGATAAGCTTCCCAGAAGAGAGGTACCACTTGG 

GCTACCAATATAAAAGGATGAAAATATCAGAGTGATGGTGTTCTTTACAA 

CGTTGAGTCCCTGGACAGCCTGTCCACTGATGCTGATATCTGAGCCTAATG 

CTTCTCTGAATGTTGAGATTGAAC1TTGATCCAATGAAACTAGAACGAGA 

AAGAAGATAAGTCTTTCATTGTTGATAAGGACATTATGTTTCTCATACITG 

TATGATTATTirrCCTTAGCTGTACTATAATTATCrGCTTATTTGTCrCTGC 

TCTATGTGCTTAGGGTACAAAGTTGACCAAGACCAACTTTGGTTGGAAGC 

ATAGTACTAAGAGCACAGTACTGAGAGCACAGTATTGAGAGCACAGCTTT 

AAAAAACATGATGAAGGCTTTAATACAGGAAATGAGCAGGGGAGAGGCA 

TGTGGTGGTTGGATGTATCTTCCTTGACACAGTCAGTGCAGCTCTCAGTAG 

TCAAGTCCCTACATGTTAGAAGATGTTACCTTCTGTGGAATTAAGTGGCAG 

AACTTGCCTTCAATTATTTTCCTTTGCAGAACAACACCAACTGCATTAGTT 

AGGACACAGTGCTGGCTGCATTTAAGTCCCAAGCGATGATTAGTCTCTCA 

CTGTTGGTATAGATTCAAACCAATCAGACCACCTCCTAAAGTTTGTAGGGC 

AGGTAAATCCTCATCTTAGAATAAAAATCATCTTACCAAGTATGTGTnTA 

GAGGCAAGAAGAAAACATATTTGTTTCTGTAAGAGTTTTGTTTAAAAAAA 

ATATAAGAAAGGCTCTCGGTTTAGGTGAGGTAATGAAGTTGTTGATAGTT 

ATCAGATGACACTGGAATCTTTACTTCTCTGAACGTGTTCTGTGCATCrCT 

CAGTGTGGGAACATAGAGAGGGAGATCCTCCAGCAATGCCACTGATATGG 

TCAGAAACTGCATCTTTCTTTCTCCCTGCTGAGATGAGATGGAGTCCTTTG 

TTCTAGAAGACCCATGGTGGTGCCGCTGGGAGTAACCCTTGAGACAGGAA 

CACAAATCCCAACCAATTTGTGGTTGGAGCCTTGAGTCrCACTATITCCCA 

TAGTGATGCGTAGCAGGGAATGGCAGGTGCACCAGAGCAGGAGAGGACC 

TAATATCTCCCTTCCTGTTAGCTTTTTATAAAGTTTTATTGTGATCAGTAGC 

AGTl^GGGAAGCTACTTGCAGTCACTGAGCCTCAGTTTCTACATCTGTAAAC 

TGGGGATAGTAGCATGGCCCCTACTTAATGTGCTCAGCAAAGCCACTGAA 

AGGAGACAGAAATGTATCTAAATTACCCTGGACTTTTATCCTA.CCTCTC1T 

GGGGATTGTCACCACCTTCCCATGTTTGTCCTTTTTGGTTTGATGCTTGCTG 

TCACTTCTTTCCTTAGGTGCCTCTCTGTACGGCrCTTTTATCCCAGGGATTC 

CAGAGTTACAGCACATGCATACCACCATCCAAGCATGTTTATTTGTCTCCT 

GCTTCACTAGGCl^GTCCCCAAGGAACATGTGGCTCCCGGCACACACCTGG 

CACAACACTGCACATGACATTCACCCACTTGGCCTTGAATCTGACAAGGA 

ATCTGGCATGATGTl^CACCCACTCAGGCCAGGTGCCGAGCAGCCCTGGAG 

GCTTAGGGGCCAGAGGGATGGGAAAAGGTGTCTTTCTGGGGTGAGTATCA 

GTTTCTGCAGGAGGGCTGAATGTGAGAAAGAATAAAGAGAG4AGGAAGC 

GAACAAGCACAGCTTAAACATCGCCTATTTCTATTGAGTTTT -\ AGAACGCT 

GTGATTTTGTTTGTCATGCAATCCATTCATCAGGCCAGGCAGACACAGAAC 

TTGGGTGTGAGTGACGATAATGAGCTGATATAATTTTCACACCCTCATCAC 

TGAGATCTCTCCCATCAGGAATGGGTCAGGGAGCTCACAGGTGGCAGCAA 

CTGCTATTACAGGCCTCATCTCTACCAGCTCCTGGGGCCTGCCCTCCTCCC 

ATTAG.AAAATCCTCCACTTGTCAAAAAGGAAGCCATTTGCTTTGAACTCCA 

ATTCCACCCCCAAGAGGCTGGGACCATCTTACTGGAGTCCTTGArGCTGTG 

TGACCTGCAGTGACCACTGCCCCATCATTGCTGGCTGAGGTGGTTGGGGTC 

CATCTGGCTATCTGGGCAGCTGTTCTCTTCTCTCCTTTCTCTCCTGTTTCCA" 

GACATGCAGTATTTCCAGAGAGAAGGGGCCACTCTTTGGCAA^GAACCTG 

TCrAACTTGCTATCTATGGCAGGACCTTTGAAGGGTTCACAGGAAGCAGC 

ACAAATTGATACTAriGCACCAAGCCATCAGCTCCATCTCATCCATGCCCT 
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GTCTCTCCTTTAGGGGTCCCCTTGCCAACAGAATCACAGAGGACCAGCCT 

GAAAGTGCAGAGACAGCAGCTGAGGCACAGCCAAGAGCTCTGGCTGTATT 

AATGACCTAAGAAGTCACCAGAAAGTCAGAAGGGATGACATGCAGAGGC 

ccagcaatctcagctaagtcaactccaccagcctttctagttgcccactgt 
gtgtacagcaccctggtagggaccagagccatgacagggaataagacta 

GACTATGCCCTTGAGGAGCTCACCTCTGTTCAGGGAAACAGGCGTGGAAA 

cacaatggtggtaaagaggaaagaggacaataggattgcatgaagggga 
tggaaggtgcccaggggaggaaatggttacatctgtgtgaggagtttggt 
gaggaaagactctaagagaaggctctgtctgtctgggtttggaaggatgt 

GTAGGAGTCTTCTAGGGGGCACAGGCACACTCCAGGCATAGGTAAAGATC 

tgtaggtgtggcttgttgggatgaatttcaagtattttggaatgaggaca 

gccatagagacaagggcaagagagaggcgatttaatagattttatgccaa 

tggctccacttgagtttctgataagaacccagaacccttggactccgcagt 

AACATTGATTGAGTTGTTTATGATACCTCATAGAATATGAACTCAAAGGA 

GGTCAGTGAGTGGTGTGTGTGTGATrCTTTGCCAACTTCCAAGGTGGAGA 

AGCCTCTTCCAACTGCAGGCAGAGCACAGGTGGCCCTGCTACTGGCTGCA 

GCTCCAGCCCTGCCTCCTTCTCTAGCATATAAACAATCCAACAGCCTCACT 

GAATCACTGCTGTGCAGGGCAGGAAAGCTCCATGCACATAGCCCAGCAAA 

GAGCAACACAGAGCTGAAAGGAAGCTTGCGGCCGCTTAACTGCAGAAGTT 

GGTCGTGAGGCACTGGGCAGGTAAGTATCAAGGTTACAAGACAGGTTTAA 

GGAGACCAATAGAAACTGGGCTTGTCGAGACAGAGAAGACTCTTGCGTTT 

CTGATAGGCACCTATTGGTCTTACTGACATCCACTTTGCCTTTCTCTCCACA 

GGTGTCCACTCCCAGGTTCAATTACAGCTCTTAAGCGGCCGCAAGCTTGGC 

ATTCCGGTACTGTTGGTAAAGCCACCATGGAAGACGCCAAAAACATAAAG 

AAAGGCCCGGCGCCATTCTATCCGCTGGAAGATGGAACCGCTGGAGAGCA 

ACTGCATAAGGCTATGAAGAGATACGCCCTGGTTCCTGGAACAATTGCTT 

TTACAGATGCACATATCGAGGTGGACATCACTTACGCTGAGTACTTCGAA 

ATGTCCGTTCGGTTGGCAGAAGCTATGAAACGATATGGGCTGAATACAAA 

TCACAGAATCGTCGTATGCAGTGAAAACTCTCTTCAATTCTTTATGCCGGT 

GTTGGGCGCGTTATTTATCGGAGTTGCAGTTGCGCCCGCGAACGACATTTA 

TAATGA.ACGTGAATTGCTCAACAGTATGGGCATTTCGCAGCCTACCGTGG 

TGTTCGTTTCCAAAAAGGGGTTGCAAAAAATTTTGAACGTGCAAAAAAAG 

CTCCCAATCATCCAAAAAATTATTATCATGGATTCTAAAACGGATTACCAG 

GGATTTCAGTCGATGTACACGTTCGTCACATCTCATCTACCTCCCGGTTTT 

AATGAATACGATTTTGTGCCAGAGTCCTTCGATAGGGACAAGACAATTGC 

ACTGATCATGAACTCCTCTGGATCTACTGGTCTGCCTAAAGGTGTCGCTCT 

GCCTCATAGAACTGCCTGCGTGAGATTCTCGCATGCCAGAGATCCTATTTT 

TGGCAATCAAATCATTCCGGATACTGCGATTTTAAGTGTTGTTCCATTCCA 

TCACGGTTTTGGAATGTTTACTACACTCGGATATTTGATATGTGGATTTCG 

AGTCGTCTTAATGTATAGATTTGAAGAAGAGCTGTTTCTGAGGAGCCTTCA 

GGATTACAAGATTCAAAGTGCGCTGCTGGTGCCAACCCTATTCTCCTTCrr 

cgccaaaagcactctgattgacaaatacgatttatctaatttacacgaaa 

TTGCTTCTGGTGGCGCTCCCCTCTCTAAGGAAGTCGGGGAAGCGGTTGCCA 

agaggttccatctgccaggtatcaggcaaggatatgggctcactgagact 

ACATCAGCTATTCTGATTACACCCGAGGGGGATGATAAACCGGGCGCGGT'' 

cggtaaagttgttccattttttgaagcgaaggttgtggatctgg^taccgg 
gaaaacgctgggcgttaatcaaagaggcgaactgtgtgtgagaggtccta 
tgattatgtccggttatgtaaacaatccggaagcgaccaacgccttgatt 
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GACAAGGATGGATGGCTACATTCTGGAGACATAGCTTACTGGGACGAAGA 

CGAACACTTCTTCATCGTTGACCGCCTGAAGTCTCTGATTAAGTACAAAGG 

CTATCAGGTGGCtCCCGCTGAATTGGAATCCATCTTGCTCCAACACCCCAA 

CATCTTCGACGCAGGTGTCGCAGGTCTTCCCGACGATGACGCCGGTGAAC 

TTCCCGCCGCCGTTGTTGTTTTGGAGCACGGAAAGACGATGACGGAAAAA 

GAGATCGTGGATTACGTCGCCAGTCAAGTAACAACCGCGAAAAAGTTGCG 

CGGAGGAGTTGTGTTTGTGGACGAAGTACCGA.'iy^GGTCTTACCGGAAAAC 

TCGACGCAAGAAAAATCAGAGAGATCCTCATAAAGGCCAAGAAGGGGGG 

AAAGATCGCCGTGTAATTCTAGAGTCGGGGCGGCCGGCCGCTTCGAGCAG 

ACATGATAAGATACATTGATGAGTTTGGACAAACCACAACTAGAATGCAG 

TGAAAAAAATGCTTTATTTGTGAAATTTGTGATGCTATTGCnTATTTGTA 

ACCATTATAAGCTGCAATAAACAAGTTAACAACAACAATTGCATTCATTTT 

ATGTTTCAGGTTCAGGGGGAGGTGTGGGAGGTTTTTTAAAGCAAGTAAAA 

CCTCTACAAsATGTGGTAAAATCGATAAGGATCGATCCGTCGAC 
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Figure 22 (panels A-D) 



